Databricks/Databricks Mcq Question Set 1 Sample Test,Sample questions

Question:
 Spark is packaged with higher level libraries, including support for _________ queries.

1.SQL

2. C

3.C++

4.None of the mentioned


Question:
Authentication and authorization in databricks can be managed for :

1.User, Group, Access Control List

2. User, Group

3. Access Control List

4.Group, Access Control List


Question:
Broadcast variables are shared, immutable variables that are cached on every machine in the cluster instead of being serialized with every single task.

1.True

2.False

3.Can’t Specify

4.None of the mentioned


Question:
broadcast variables are ______ and lazily replicated across all nodes in the cluster when an action is triggered

1.mutable

2. immutable

3. both

4.None of above


Question:
Choose the correct option with respect to ETL operations of data in Azure Databricks?

1.For loading of data, data is moved from databricks to data warehouse

2.for loading of data, blob storage is used

3.Blob storage serves as a temporary storage

4. All of the above


Question:
Fault Tolerance in RDD is achieved using

1.Immutable nature of RDD

2.DAG (Directed Acyclic Graph)

3.Lazy-evaluation

4.none of the above


Question:
For Multiclass classification problem which algorithm is not the solution?

1.Naive Bayes

2.Random Forests

3.Logistic Regression

4.Decision Trees


Question:
Given a DataFrame df that has some null values in the column created_date, find the code below such that it will sort rows in ascending order based on the column creted_date with null values appearing last.

1.orderBy(asc_nulls_last(“created_date”))

2. sort(asc_nulls_last(“created_date”))

3.orderBy(col(“created_date”).asc_nulls_last())

4.orderBy(col(“created_date”), ascending=True))


Question:
Given a DataFrame df that includes a number of columns among which a column named quantity and a column named price, complete the code below such that it will create a DataFrame including all the original columns and a new column revenue defined as quantity*price:

1.df.withColumnRenamed(“revenue”, expr(“quantity*price”))

2.df.withColumn(revenue, expr(“quantity*price”))

3.df.withColumn(“revenue”, expr(“quantity*price”))

4.df.withColumn(expr(“quantity*price”), “revenue”)


Question:
Given a dataframe df, select the code that returns its number of rows:

1.df.take(‘all’)

2.df.collect()

3.df.count()

4.df.numRows()


Question:
Is it possible to mitigate stragglers in RDD?

1.Yes

2. No

3. Both

4.None of the mentioned


Question:
RDD is fault-tolerant and immutable

1.True

2. False

3.Both

4. none of the mentioned


Question:
Spark includes a collection over ________ operators for transforming data and familiar data frame APIs for manipulating semi-structured data.

1.50

2.60

3.70

4.80


Question:
Spark is developed in which language

1.Java

2.Scala

3.Python

4.R


Question:
Spark is engineered from the bottom-up for performance, running ______ faster than Hadoop by exploiting in memory computing and other optimizations.

1.100x

2.150x

3.200x

4.None of the mentioned


Question:
Spark powers a stack of high-level tools including Spark SQL, MLlib for _____

1.regression models

2. statistics

3.machine learning

4.reproductive research


Question:
Spark SQL provides a domain-specific language to manipulate ___________ in Scala, Java, or Python.

1.Spark Streaming

2.Spark SQL

3.RDDs

4.All of the Mentioned


Question:
Spark was initially started by ______ at UC Berkeley AMPLab in 2009.

1.Mahek Zaharia

2.Matei Zaharia

3.Doug Cutting

4.Stonebraker


Question:
Streaming data can be captured by?

1.Kafka

2.Event Hubs

3.Both A and B

4.none of the above


Question:
The read operation on RDD is

1.Fine-grained

2.Coarse-grained

3.Either fine-grained or coarse-grained

4.Neither fine-grained nor coarse-grained


Question:
The shortcomings of Hadoop MapReduce was overcome by Spark RDD by

1.Lazy-evaluation

2.DAG

3.In-memory processing

4.All of the above


Question:
The write operation on RDD is

1.Fine-grained

2.Coarse-grained

3.Either fine-grained or coarse-grained

4.Neither fine-grained nor coarse-grained


Question:
To which one of the following sources do Azure Databricks connect for collecting streaming data?

1.Kafka

2.Azure data lake

3.CosmosDB

4.none of the above


Question:
Users can easily run Spark on top of Amazon’s _____

1. Infosphere

2.EC2

3.EMR

4.None of the mentioned


Question:
What is action in Spark RDD?

1.The ways to send result from executors to the driver

2.Takes RDD as input and produces one or more RDD as output.

3.Creates one or many new RDDs

4.All of the above


Question:
Which of the following algorithm is not present in MLlib?

1.Streaming Linear Regression

2. Streaming KMeans

3.Tanimoto distance

4.none of the above


Question:
Which of the following Azure datasources can be connected to Azure Databricks?

1.Azure Blob Storage

2.Azure Datawarehouse

3. Azure CosmosDB

4.All of the above


Question:
Which of the following can be used to launch Spark jobs inside MapReduce?

1.SIM

2.SIMR

3.SIR

4.RIS


Question:
Which of the following ensures data reliability even after termination of cluster in Azure Databricks?

1.Databricks Runtime

2.Databricks File System

3.Dashboards

4.Workspace


Question:
Which of the following is a tool of Machine Learning Library?

1.Persistence

2.Utilities like linear algebra, statistics

3.Pipelines

4.All of the above.


Question:
Which of the following is a transformation?

1.foreach()

2. flatMap()

3.save()

4.count()


Question:
Which of the following is an actions

1. foreach()

2.printSchema()

3.cache()

4.sort()


Question:
Which of the following is not a component of the Spark Ecosystem?

1.Sqoop

2.GraphX

3.MLlib

4.BlinkDB


Question:
Which of the following is NOT an actions

1.foreach()

2.printSchema()

3.first()

4.reduce()


Question:
Which of the following is not the feature of Spark?

1.Supports in-memory computation

2.Fault-tolerance

3. It is cost-efficient

4.Compatible with other file storage system


Question:
Which of the following is the reason for Spark being Speedy than MapReduce?

1.DAG execution engine and in-memory computation

2.Support for different language APIs like Scala, Java, Python and R

3.RDDs are immutable and fault-tolerant

4.none of the above


Question:
Which of the following is true for RDD?

1.We can operate Spark RDDs in parallel with a low-level API

2.RDDs are similar to the table in a relational database

3.It allows processing of a large amount of structured data

4.It has built-in optimization engine


Question:
Which of the following is true for Spark core?

1.It is the kernel of Spark

2.It enables users to run SQL / HQL queries on the top of Spark.

3. It is the scalable machine learning library which delivers efficiencies

4. Improves the performance of iterative algorithm drastically.


Question:
Which of the following is true for Spark MLlib?

1. Provides an execution platform for all the Spark applications

2. It is the scalable machine learning library which delivers efficiencies

3.enables powerful interactive and data analytics application across live streaming data

4.All of the above


Question:
Which of the following language is not supported by Spark?

1. Java

2.Pascal

3.Scala

4.Python


Question:
Which of the following statements are NOT true for broadcast variables ?

1.Broadcast variables are shared, immutable variables that are cached on every machine in the cluster instead of being serialized with every single task.

2.A custom broadcast class can be defined by extending org.apache.spark.utilbroadcastV2 in Java or Scala or pyspark.Accumulatorparams in Python. –> CORRECT

3. It is a way of updating a value inside a variety of transformations and propagating that value to the driver node in an efficient and fault-tolerant way.–> CORRECT

4. It provides a mutable variable that Spark cluster can safely update on a per-row basis. –> CORRECT


Question:
Which one of the following command triggers an eager evaluation?

1.df.filter()

2.df.select()

3.df.show()

4.df.limit()


Question:
Which one of the following commands does NOT trigger an eager evaluation?

1.df.collect()

2.df.take()

3.df.show()

4.df.join() –> CORRECT


Question:
Which one of the following is a Databrick concept?

1.Workspace

2.Authentication and authorization

3.Data Management

4.All of the above


Question:
Which one of the following is a set of components that run on clusters of Azure Databricks?

1.DataBricks File System

2.Databricks Runtime

3.CosmosDB

4.Azure Data Lake


Question:
Which one of the following is incorrect regarding Workspace of Azure Databricks concept?

1.It manages ETL operations of data

2.It can store notebooks, libraries and dashboards

3.It is the root folder of Azure Databricks

4.none of the above


Question:
Which one of the following is not a operations that can be performed using Azure Databricks?

1. It is Apache Spark based analytics platform

2. It helps to extract, transform and load the data

3.Visualization if data is not possible with it

4.All of the above


Question:
____ is a distributed machine learning framework on top of Spark.

1.MLlib

2.Spark Streaming

3.GraphX

4.RDDs


Question:
______ is a component on top of Spark Core.

1.Spark Streaming

2.Spark SQL

3. RDDs

4.All of the Mentioned


Question:
_______ leverages Spark Core fast scheduling capability to perform streaming analytics.

1.MLlib

2.Spark Streaming

3.GraphX

4.RDDs


More MCQS

  1. Databricks Mcq Question Set 1
  2. Databricks Mcq Question Set 2
Search
R4R Team
R4Rin Top Tutorials are Core Java,Hibernate ,Spring,Sturts.The content on R4R.in website is done by expert team not only with the help of books but along with the strong professional knowledge in all context like coding,designing, marketing,etc!