This is one of the most frequently asked spark interview questions, and the interviewer will expect you to give a thorough answer to it.
Spark applications run as independent processes that are coordinated by the SparkSession object in the driver program. The resource manager or cluster manager assigns tasks to the worker nodes with one task per partition. Iterative algorithms apply operations repeatedly to the data so they can benefit from caching datasets across iterations. A task applies its unit of work to the dataset in its partition and outputs a new partition dataset. Finally, the results are sent back to the driver application or can be saved to the disk.
Posted Date:- 2021-10-22 03:29:20
What file systems does Spark support?
How can Apache Spark be used alongside Hadoop?
What do you understand by worker node?
Is there a module to implement SQL in Spark? How does it work?
How is machine learning implemented in Spark?
Is there an API for implementing graphs in Spark?
How is Streaming implemented in Spark? Explain with examples.
Name the components of Spark Ecosystem.
What do you understand by Transformations in Spark?
Define Partitions in Apache Spark.
What is Executor Memory in a Spark application?
Under what scenarios do you use Client and Cluster modes for deployment?
Explain the working of Spark with the help of its architecture.
HOW MANY FORMS OF TRANSFORMATIONS ARE THERE?
EXPLAIN WHAT ACCUMULATORS ARE.
Is there any benefit of learning MapReduce if Spark is better than MapReduce?
What is a lazy evaluation in Spark?
Do you need to install Spark on all nodes of YARN cluster?
What are the data formats supported by Spark?
What are the languages supported by Apache Spark and which is the most popular one?
Define the functions of Spark Core.
WHAT IS THE METHOD FOR CREATING A DATA FRAME?
EXPLAIN THE CONCEPT OF SPARSE VECTOR.
What are the different cluster managers available in Apache Spark?
What are receivers in Apache Spark Streaming?
Is it possible to run Apache Spark on Apache Mesos?
What are the steps involved in structured API execution in Spark?
What do you understand by lazy evaluation?
What is the role of a Spark Driver?
How many types of Deploy mode are there in Spark?
Name different types of data sources available in SparkSQL.
Can you use Spark to access and analyse data stored in Cassandra databases?
What are the languages supported by Apache Spark for developing big data applications?
Explain about transformations and actions in the context of RDDs.
List some use cases where Spark outperforms Hadoop in processing.
Explain how Spark runs applications with the help of its architecture.
What are the important components of the Spark ecosystem?