Transformations are functions applied on RDD, resulting into another RDD. It does not execute until an action occurs. map() and filter() are examples of transformations, where the former applies the function passed to it on each element of RDD and results into another RDD. The filter() creates a new RDD by selecting elements from current RDD that pass function argument.
val rawData=sc.textFile("path to/movies.txt")
val moviesData=rawData.map(x=>x.split(" "))
As we can see here, rawData RDD is transformed into moviesData RDD. Transformations are lazily evaluated.
Posted Date:- 2021-10-22 03:55:11
What file systems does Spark support?
How can Apache Spark be used alongside Hadoop?
What do you understand by worker node?
Is there a module to implement SQL in Spark? How does it work?
How is machine learning implemented in Spark?
Is there an API for implementing graphs in Spark?
How is Streaming implemented in Spark? Explain with examples.
Name the components of Spark Ecosystem.
What do you understand by Transformations in Spark?
Define Partitions in Apache Spark.
What is Executor Memory in a Spark application?
Under what scenarios do you use Client and Cluster modes for deployment?
Explain the working of Spark with the help of its architecture.
HOW MANY FORMS OF TRANSFORMATIONS ARE THERE?
EXPLAIN WHAT ACCUMULATORS ARE.
Is there any benefit of learning MapReduce if Spark is better than MapReduce?
What is a lazy evaluation in Spark?
Do you need to install Spark on all nodes of YARN cluster?
What are the data formats supported by Spark?
What are the languages supported by Apache Spark and which is the most popular one?
Define the functions of Spark Core.
WHAT IS THE METHOD FOR CREATING A DATA FRAME?
EXPLAIN THE CONCEPT OF SPARSE VECTOR.
What are the different cluster managers available in Apache Spark?
What are receivers in Apache Spark Streaming?
Is it possible to run Apache Spark on Apache Mesos?
What are the steps involved in structured API execution in Spark?
What do you understand by lazy evaluation?
What is the role of a Spark Driver?
How many types of Deploy mode are there in Spark?
Name different types of data sources available in SparkSQL.
Can you use Spark to access and analyse data stored in Cassandra databases?
What are the languages supported by Apache Spark for developing big data applications?
Explain about transformations and actions in the context of RDDs.
List some use cases where Spark outperforms Hadoop in processing.
Explain how Spark runs applications with the help of its architecture.
What are the important components of the Spark ecosystem?