HDFS is more suitable for large amounts of data sets in a single file as compared to small amount of data spread across multiple files. As you know, the NameNode stores the metadata information regarding the file system in the RAM. Therefore, the amount of memory produces a limit to the number of files in my HDFS file system. In other words, too many files will lead to the generation of too much metadata. And, storing these metadata in the RAM will become a challenge. As a thumb rule, metadata for a file, block or directory takes 150 bytes.
Posted Date:- 2021-08-31 06:07:22
How to use Apache Zookeeper command line interface?
Explain about ZooKeeper in Kafka
Does Apache Flume provide support for third party plug-ins?
How multi-hop agent can be setup in Flume?
What are the differences between Pig and SQL?
How do you configure an “Oozie†job in Hadoop?
What are the components of Region Server?
Can the default “Hive Metastore†be used by multiple users (processes) at the same time?
What are the different data types in Pig Latin?
What do you know about “SequenceFileInputFormat�
Does Flume provide 100% reliability to the data flow?
What are the limitations of importing RDBMS tables into Hcatalog directly?
Can free form SQL queries be used with Sqoop import command? If yes, then how can they be used?
How are large objects handled in Sqoop?
What is the standard location or path for Hadoop Sqoop scripts?
Is it possible to do an incremental import using Sqoop?
Explain “Distributed Cache†in a “MapReduce Frameworkâ€.
What is the purpose of “RecordReader†in Hadoop?
State some key components of ZooKeeper.
What do you know about Sqoop metastore?
How to import BLOB and CLOB-like big objects in Sqoop?
Mention the consequences of Distributed Applications.
What is Apache Flume in Hadoop ?
How can I restart “NameNode†or all the daemons in Hadoop?
What is “speculative execution†in Hadoop?
How do you define “Rack Awareness†in Hadoop?
What does ‘jps’ command do?
Can NameNode and DataNode be a commodity hardware?
What applications are supported by Apache Hive?
Explain the process of row deletion in HBase.
Explain about the different catalog tables in HBase?
Explain the three types of tombstone markers for deletion.
What are the basic parameters of a mapper?
Explain the distributed Cache in MapReduce framework.
Explain the difference between RDBMS data model and HBase data model.
What are the different operational commands in HBase at record level and table level?
Explain the actions followed by a Jobtracker in Hadoop.
How does NameNode tackle DataNode failures?
What happens when two clients try to access the same file in the HDFS?