Hive Interview Questions for Freshers/Hive Interview Questions and Answers for Freshers & Experienced

If you run a select * query in Hive, why doesn't it run MapReduce?

The hive.fetch.task.conversion property of Hive lowers the latency of MapReduce overhead, and in effect when executing queries such as SELECT, FILTER, LIMIT, etc. it skips the MapReduce function.

What is the usefulness of the DISTRIBUTED BY clause in Hive?

It controls ho wthe map output is reduced among the reducers. It is useful in case of streaming data

As part of Optimizing the queries in HIve, what should be the order of table size in a join query?

In a join query the smallest table to be taken in the first position and largest table should be taken in the last position.

Is it possible to use the same metastore by multiple users, in case of the embedded hive?

No, we cannot use metastore in sharing mode. It is possible to use it in standalone “real” database. Such as MySQL or PostGresSQL.

What is the precedence order of Hive configuration?

We are using a precedence hierarchy for setting properties:

1. The SET command in Hive
2. The command-line –hiveconf option
3. Hive-site.XML
4. Hive-default.xml
5. Hadoop-site.xml
6. Hadoop-default.xml

Wherever (Different Directory) I run the hive query, it creates new metastore_db, please explain the reason for it?

Basically, it creates the local metastore, while we run the hive in embedded mode. Also, it looks whether metastore already exist or not before creating the metastore. Hence, in configuration file hive-site.xml. Property is “javax.jdo.option.ConnectionURL” with default value “jdbc:derby:;databaseName=metastore_db;create=true” this property is defined. Hence, to change the behavior change the location to the absolute path, thus metastore will be used from that location.

How does data transfer happen from HDFS to Hive?

Basically, the user need not LOAD DATA that moves the files to the /user/hive/warehouse/. But only if data is already present in HDFS. Hence, using the keyword external that creates the table definition in the hive metastore the user just has to define the table.
Create external table table_name (
id int,
myfields string
)
location ‘/my/location/in/hdfs’;

Is it possible to change the default location of Managed Tables in Hive, if so how?

Yes, by using the LOCATION keyword while creating the managed table, we can change the default location of Managed tables. But the one condition is, the user has to specify the storage path of the managed table as the value of the LOCATION keyword.

How can you add a new partition for the month December in the above partitioned table?

For adding a new partition in the above table partitioned_transaction, we will issue the command give below:

ALTER TABLE partitioned_transaction ADD PARTITION (month=’Dec’) LOCATION ‘/partitioned_transaction’;

How can you configure remote metastore mode in Hive?

Basically, hive-site.xml file has to be configured with the below property, to configure metastore in Hive –
hive.metastore.uris
thrift: //node1 (or IP Address):9083
IP address and port of the metastore host

IS IT POSSIBLE TO RUN UNIX SHELL COMMANDS IN HIVE?

Yes, one can run shell commands in Hive by adding a ‘!’ before the command.

In Hive, can you overwrite Hadoop MapReduce configuration in Hive?

Yes, you can overwrite Hadoop MapReduce configuration in Hive.

Explain how Hive Deserialize and serialize the data?

Usually, while read/write the data, the user first communicate with inputformat. Then it connects with Record reader to read/write record. To serialize the data, the data goes to row. Here deserialized custom serde use object inspector to deserialize the data in fields.

What Options Are Available When It Comes to Attaching Applications to the Hive Server?

Explain the three different ways (Thrift Client, JDBC Driver, and ODBC Driver) you can connect applications to the Hive Server. You’ll also want to explain the purpose for each option: for example, using JDBC will support the JDBC protocol.

Mention if we can name view same as the name of a Hive table?

No. The name of a view must be unique compared to all other tables and as views present in the same database.

What is ObjectInspector functionality?

To analyze the structure of individual columns and the internal structure of the row objects we use ObjectInspector. Basically, it provides access to complex objects which can be stored in multiple formats in Hive.

Why do we perform partitioning in Hive?

Partitioning provides granularity in a Hive table and therefore, reduces the query latency by scanning only relevant partitioned data instead of the whole data set.

For example, we can partition a transaction log of an e – commerce website based on month like Jan, February, etc. So, any analytics regarding a particular month, say Jan, will have to scan the Jan partition (sub – directory) only instead of the whole table data.

When should we use SORT BY instead of ORDER BY?

We should use SORT BY instead of ORDER BY when we have to sort huge datasets because SORT BY clause sorts the data using multiple reducers whereas ORDER BY sorts all of the data together using a single reducer. Therefore, using ORDER BY against a large number of inputs will take a lot of time to execute.

What is indexing and why do we need it?

Hive index is a Hive query optimization techniques. Basically, we use it to speed up the access of a column or set of columns in a Hive database. Since, the database system does not need to read all rows in the table to find the data with the use of the index, especially that one has selected.

How Hive distributes the rows into buckets?

By using the formula: hash_function (bucketing_column) modulo (num_of_buckets) Hive determines the bucket number for a row. Basically, hash_function depends on the column data type. Although, hash_function for integer data type will be:
hash_function (int_type_column)= value of int_type_column

Why do we need buckets?

Basically, for performing bucketing to a partition there are two main reasons:

* A map side join requires the data belonging to a unique join key to be present in the same partition.

* It allows us to decrease the query time. Also, makes the sampling process more efficient.

Mention what is ObjectInspector functionality in Hive?

ObjectInspector functionality in Hive is used to analyze the internal structure of the columns, rows, and complex objects. It allows to access the internal fields inside the objects.

Explain what is a Hive variable? What for we use it?

Hive variable is created in the Hive environment that can be referenced by Hive scripts. It is used to pass some values to the hive queries when the query starts executing.

Is it possible to change the default location of a managed table?

Yes, it is possible to change the default location of a managed table. It can be achieved by using the clause – LOCATION ‘<hdfs_path>’.

Mention Hive default read and write classes?

Hive default read and write classes are

1. TextInputFormat/HiveIgnoreKeyTextOutputFormat
2. SequenceFileInputFormat/SequenceFileOutputFormat

Mention what are the type of database does Hive support ?

For single user metadata storage, Hive uses derby database and for multiple user Metadata or shared Metadata case Hive uses MYSQL.

What is the difference between local and remote metastore?

Local Metastore:

In local metastore configuration, the metastore service runs in the same JVM in which the Hive service is running and connects to a database running in a separate JVM, either on the same machine or on a remote machine.

Remote Metastore:

In the remote metastore configuration, the metastore service runs on its own separate JVM and not in the Hive service JVM. Other processes communicate with the metastore server using Thrift Network APIs. You can have one or more metastore servers in this case to provide more availability.

Why Hive does not store metadata information in HDFS?

Hive stores metadata information in the metastore using RDBMS instead of HDFS. The reason for choosing RDBMS is to achieve low latency as HDFS read/write operations are time consuming processes.

What is a metastore in Hive?

Metastore in Hive stores the meta data information using RDBMS and an open source ORM (Object Relational Model) layer called Data Nucleus which converts the object representation into relational schema and vice versa.

What is dynamic partitioning and when is it used?

Dynamic partitioning values for partition columns are known in the runtime. In other words, it is known during loading of the data into a Hive table.

Usage:

* While we Load data from an existing non-partitioned table, in order to improve the sampling. Thus it decreases the query latency.
* Also, while we do not know all the values of the partitions beforehand. Thus, finding these partition values manually from a huge dataset is a tedious task.

Why do we perform partitioning in Hive?

In a Hive table, Partitioning provides granularity. Hence, by scanning only relevant partitioned data instead of the whole dataset it reduces the query latency.

Mention when to use Map reduce mode?

Map reduce mode is used when,

It will perform on large amount of data sets and query going to execute in a parallel way
Hadoop has multiple data nodes, and data is distributed across different node we use Hive in this mode
Processing large data sets with better performance needs to be achieved

What is a partition in Hive?

Basically, for the purpose of grouping similar type of data together on the basis of column or partition key, Hive organizes tables into partitions.

Moreover, to identify a particular partition each table can have one or more partition keys. On defining Hive Partition, in other words, it is a sub-directory in the table directory.

When should we use SORT BY instead of ORDER BY?

Despite ORDER BY we should use SORT BY. Especially while we have to sort huge datasets. The reason is SORT BY clause sorts the data using multiple reducers. ORDER BY sorts all of the data together using a single reducer.

Hence, using ORDER BY will take a lot of time to execute a large number of inputs.

What is the difference between the external table and managed table?

Managed table

The metadata information along with the table data is deleted from the Hive warehouse directory if one drops a managed table.

External table

Hive just deletes the metadata information regarding the table. Further, it leaves the table data present in HDFS untouched.

WHAT IS THE DIFFERENCE BETWEEN LOCAL AND REMOTE META STORES?

Local meta stores run on the same Java Virtual Machine (JVM) as the Hive service whereas remote meta stores run on a separate, distinct JVM.

Can we change the data type of a column in a hive table?

Using REPLACE column option

ALTER TABLE table_name REPLACE COLUMNS ……

What is the default database provided by Apache Hive for metastore?

It offers an embedded Derby database instance backed by the local disk for the metastore, by default. It is what we call embedded metastore configuration.

What is the difference between local and remote metastore?

Local Metastore:

It is the metastore service runs in the same JVM in which the Hive service is running and connects to a database running in a separate JVM. Either on the same machine or on a remote machine.

Remote Metastore:

In this configuration, the metastore service runs on its own separate JVM and not in the Hive service JVM.

CAN THE DEFAULT LOCATION OF A MANAGED TABLE BE CHANGED IN HIVE?

Yes, the default managed table location can be changed in Hive by using the LOCATION ‘<hdfs_path>’ clause.

WHERE IS HIVE TABLE DATA STORED?

Hive table data is stored in an HDFS directory by default – user/hive/warehouse. This can be altered.

Why does Hive not store metadata information in HDFS?

Using RDBMS instead of HDFS, Hive stores metadata information in the metastore. Basically, to achieve low latency we use RDBMS. Because HDFS read/write operations are time-consuming processes.

Can a table be renamed in Hive?

Alter Table table_name RENAME TO new_name

Is Hive suitable to be used for OLTP systems? Why?

No Hive does not provide insert and update at row level. So it is not suitable for OLTP system.

What are the different types of tables available in HIve?

There are two types. Managed table and external table. In managed table both the data an schema in under control of hive but in external table only the schema is under control of Hive.

CAN A TABLE NAME BE CHANGED IN HIVE?

Yes, you can change a table name in Hive. You can rename a table name by using: Alter Table table_name RENAME TO new_name.

What is a metastore in Hive?

Basically, to store the metadata information in the Hive we use Metastore. Though, it is possible by using RDBMS and an open source ORM (Object Relational Model) layer called Data Nucleus. That converts the object representation into the relational schema and vice versa.

Where does the data of a Hive table gets stored?

In an HDFS directory – /user/hive/warehouse, the Hive table is stored, by default only. Moreover, by specifying the desired directory in hive.metastore.warehouse.dir configuration parameter present in the hive-site.xml, one can change it.

What kind of applications is supported by Apache Hive?

All those client applications which are written in Java, PHP, Python, C++ or Ruby by exposing its thrift server, Hive supports them.

What is Apache Hive?

Basically, a tool which we call a data warehousing tool is Hive. However, Hive gives SQL queries to perform an analysis and also an abstraction. Although, Hive it is not a database it gives you logical abstraction over the databases and the tables.

Search
R4R Team
R4R provides Hive Freshers questions and answers (Hive Interview Questions and Answers) .The questions on R4R.in website is done by expert team! Mock Tests and Practice Papers for prepare yourself.. Mock Tests, Practice Papers,Hive Interview Questions for Freshers,Hive Freshers & Experienced Interview Questions and Answers,Hive Objetive choice questions and answers,Hive Multiple choice questions and answers,Hive objective, Hive questions , Hive answers,Hive MCQs questions and answers Java, C ,C++, ASP, ASP.net C# ,Struts ,Questions & Answer, Struts2, Ajax, Hibernate, Swing ,JSP , Servlet, J2EE ,Core Java ,Stping, VC++, HTML, DHTML, JAVASCRIPT, VB ,CSS, interview ,questions, and answers, for,experienced, and fresher R4r provides Python,General knowledge(GK),Computer,PHP,SQL,Java,JSP,Android,CSS,Hibernate,Servlets,Spring etc Interview tips for Freshers and Experienced for Hive fresher interview questions ,Hive Experienced interview questions,Hive fresher interview questions and answers ,Hive Experienced interview questions and answers,tricky Hive queries for interview pdf,complex Hive for practice with answers,Hive for practice with answers You can search job and get offer latters by studing r4r.in .learn in easy ways .