Cassandra Interview Questions for Experienced/Cassandra Interview Questions and Answers for Freshers & Experienced

Do I need to use a caching layer (like memcached) with Cassandra?

Cassandra negates the need for extra software caching layers like memcached through its distributed architecture, fast write throughput capabilities, and internal memory caching structures.

What is a "snitch"?

The snitch is a configurable component of a Cassandra cluster used to define how the nodes are grouped together within the overall network topology (such as rack and data center groupings). Cassandra uses this information to route inter-node requests as efficiently as possible within the confines of the replica placement strategy. The snitch does not affect requests between the client application and Cassandra (it does not control which node a client connects to).

What are ‘seed nodes' in Cassandra?

A seed node in Cassandra is a node that is contacted by other nodes when they first start up and join the cluster. A cluster can have multiple seed nodes. Cassandra uses a protocol called gossip to discover location and state information about the other nodes participating in a Cassandra cluster. When a node first starts, it contacts a seed node to bootstrap the gossip communication process. The seed node designation has no purpose other than bootstrapping new nodes joining the cluster. Seed nodes are not a single point of failure.

How is my data partitioned in Cassandra across nodes in a cluster?

Cassandra provides a number of options to partition your data across nodes in a cluster.
The RandomPartitioner is the default partitioning strategy for a Cassandra cluster. It uses a consistent hashing algorithm to determine which node will store a particular row. The end result is an even distribution of data across a cluster.
The ByteOrderedPartitioner ensures that row keys are stored in sorted order. It is not recommended for most use cases and can result in uneven distribution of data across a cluster.

What is nodetool utility in Cassandra?

The nodetool utility is a command line interface for managing a cluster.

What do you understand by Snapshot in Cassandra?

Cassandra backs up data by taking a snapshot of all on-disk data files (SSTable files) stored in the data directory. You can take a snapshot of all keyspaces, a single keyspace, or a single table while the system is online.

Using a parallel ssh tool (such as pssh), you can snapshot an entire cluster. This provides an eventually consistent backup. Although no one node is guaranteed to be consistent with its replica nodes at the time a snapshot is taken, a restored snapshot resumes consistency using Cassandra's built-in consistency mechanisms.

How will you move data to or from other databases?

Cassandra offers several solutions for migrating from other databases:

* The COPY command, which mirrors what the PostgreSQL RDBMS uses for file/export import.
* The Cassandra bulk loader provides the ability to bulk load external data into a cluster.
If you need more sophistication applied to a data movement situation (more than just extract-load), then you can use any number of extract-transform-load (ETL) solutions that now support Cassandra.

Describe the different consistency levels for read operation in Cassandra?

All
It is extremely consistent. It is compulsory to a write needs to be written to memtable and commit log which is on copy nodes in the group
EACH_QUORUM
It is compulsory for a write needs to be written to memtable and commit log on quorum which exists on copy nodes in all data centers
LOCAL_QUORUM
It is compulsory for a write needs to be written to memtable and commit log on the quorum of copy nodes but only in the same center.
ONE
It is compulsory for a write needs to be written to memtable and commit log with one or more replica node.
TWO
It is compulsory for a write needs to be written to memtable and commit log with one or more replica node.
THREE
Same as the above but it should be with three replica nodes, sequentially

What is the procedure of data storage in Cassandra?

The data stored in Cassandra is in bytes. When the user or client is sure about the approver, then these bytes are encoded by the Cassandra according to the need. After the completion, a comparator orders the encoding based on the column.

Composites have a particular coding and are patterned in bytes. For each and every component there is always a storage of two-byte length and it is supported by the byte-encoded element which is further accompanied by a termination bit.

Does Cassandra use a master/slave architecture or something else?

Cassandra does not use a master/slave architecture, but instead uses a peer-to-peer implementation, which avoids the pitfalls, latency problems, single point of failure issues, and performance headaches associated with master/slave setups.

When to use RDBMS instead of Cassandra?

Cassandra is based on NoSQL database and does not provide ACID and relational data property. If you have strong requirement of ACID property (for example Financial data), Cassandra would not be a fit in that case. Obviously, you can make work out of it, however you will end up writing lots of application code to handle ACID property and will loose on time to market badly. Also managing that kind of system with Cassandra would be complex and tedious for you.

What do you mean by SS Table and explain how it is different from the other original tables?

SS Table stands for Sorted String Table which indicates the presence of an important file in Cassandra and it accepts the repeated number of written memtables. These memtables are stockpiled on disk. It remains for every Cassandra table. A main feature of the SS Table is that it provides stability to the data files as it does not allow any changes once the data is written. Moreover, Cassandra generates three split files. These files are like bloom filter, partition summary and partition index.

Explain the concept of Cassandra Data Model?

Cassandra Data Model is composed of four main components:

Cluster: -It is inclusive of a lot of nodes and key spaces.

Keyspace: It consists of a namespace to the group having a lot of column family, particularly, one per division

Column: It is inclusive of a name of the column, timestamp, and value.

Column family: It consists of a number of the columns with row key referral.

What is Consistency Availability and Partition tolerance theorem ?

The CAP theorem states that it is impossible for a distributed computer system to simultaneously provide Consistency, Availability, Partition Tolerance at the same time.

Cassandra is generally classified as an AP system, meaning that availability and partition tolerance are generally considered to be more important than Consistency in Cassandra. But, Cassandra can be tuned with replication factor and consistency level to also meet the C in CAP.

What is Hadoop, HBase, Hive and Cassandra? Specify similarities and differences among them.

Hadoop, HBase, Hive and Cassandra all are Apache products.

Apache Hadoop supports file storage, grid compute processing via Map reduce. Apache Hive is a SQL like interface on the top of Haddop. Apache HBase follows column family storage built like Big Table. Apache Cassandra also follows column family storage built like Big Table with Dynamo topology and consistency.

What are the differences between a node, a cluster, and datacenter in Cassandra?

Node: A node is a single machine running Cassandra.

Cluster: A cluster is a collection of nodes that contains similar types of data together.

Datacenter: A datacenter is a useful component when serving customers in different geographical areas. Different nodes of a cluster can be grouped into different data centers.

Explain the super column in Cassandra?

A super column in Cassandra is an extraordinary and important column. It has so much value because it has the roadmap to all the sub-columns in the database.

These super columns are used to improve the performance of the database

What are the management tools in Cassandra?

DataStaxOpsCenter: It is an internet-based management and monitoring solution for Cassandra cluster and DataStax. It is free to download and includes an additional Edition of OpsCenter.

SPM: SPM primarily administers Cassandra metrics and various OS and JVM metrics. It also monitors Hadoop, Spark, Solr, Storm, zookeeper and other Big Data platforms besides Cassandra.

What is memtable?

Memtable is in-memory/write-back cache space containing content in key and column format. In memtable, data is sorted by key, and each ColumnFamily has a distinct memtable that retrieves column data via key. It stores the writes until it is full, and then flushed out.

Can we change the replication factor on a live cluster?

Yes, but it will require running repair to alter the replica count of the existing data.

What is replication factor in Cassandra?

Replication factor is the measure of the number of data copies existing. It is important to increase the replication factor to log into the cluster.

How many types of tunable consistency are supported in Cassandra?

The default settings state that Cassandra uses 7000 port for Cluster Management, 9160 for Thrift Clients, and 8080 for JMX. These are all TCP ports and can be edited in the configuration file: bin/cassandra.in.sh

What are CRUD operations?

These operations are used to make changes in the Cassandra database.

CRUD stands for

* reate operation
* Read operation
* Update operation and
* Delete/drop operation.

What are the best monitor tools for Cassandra?

Although Cassandra comes with built-in tolerance features, it still needs to be monitored for effective results. Here are some tools which Cassandra uses to monitor its databases:

* Solarwind server and application monitor
* Instana
* Instaclustr
* AppDynamics
* Dynatrace
* Machine engine applications manager.

What is the relation between tunable consistency and Cassandra?

Tunable consistency ensures proper levels of consistency for its reads and writes which is the main reason why Cassandra prefers NoSQL databases.

On what platforms does Cassandra run?

Since Cassandra is a Java application, it can successfully run on any Java-driven platform or on Java Runtime Environment (JRE) or Java Virtual Machine (JVM). Cassandra also runs on Red Hat, CentOS, Debian, and Ubuntu Linux platforms.

Explain Tombstone in Cassandra.

Tombstone is a row marker indicating a column deletion. These marked columns are deleted during compaction. Tombstones are of great significance as Cassandra supports eventual consistency, where the data must respond before any successful operation.

What is Thrift?

Thrift is a legacy RPC protocol or API unified with a code generation tool for CQL. The purpose of using Thrift in Cassandra is to facilitate access to the DB across the programming language.

What is the difference between Column and Super Column?

Both elements work on the principle of tuples having name and value. However, the former’s value is a string, while the value of the latter is a map of columns with different data types.

Unlike Columns, Super Columns do not contain the third component of timestamp.

What is Super Column in Cassandra?

Cassandra Super Column is a unique element consisting of similar collections of data. They are actually key–value pairs with values as columns. It is a sorted array of columns, and they follow a hierarchy when in action: keystore > column family > super column > column data structure in JSON.
Similar to the row keys, super column data entries contain no independent values but are used to collect other columns. It is interesting to note that super column keys appearing in different rows do not necessarily match and will not ever.

What is Python Stress test in Cassandra?

Cassandra comes with a popular utility called py_stress that can be used to run a stress test on Cassandra cluster. The Cassandra-stress tool is a Java-based stress testing utility for basic benchmarking and load testing a Cassandra cluster. This is an effective tool for populating a cluster and stress testing CQL tables and queries.

What are snapshots and how do you create one in Cassandra?

Snapshot represents the state of the data files at a particular point in time. Snapshot command is used while taking a backup and creates hard links for SSTables in the snapshots folder which can later be used to restore the node,

What is JMX? And How is it useful in Cassandra?

JMX (Java Management Extension) is a Java technology that supplies tools for managing and monitoring Java applications and services. Cassandra makes use of JMX to enable remote management of the servers.

What is Hinted Handoff?

Hinted Handoff is a mechanism to ensure availability, fault-tolerance and graceful degradation in Cassandra. The node that receives the hint will know when the unavailable node comes back online again, because of Gossip.

What is BASE?

Not every application or software needs this strong consistency, so this is where the base comes into action. The BASE stands for Basically Available Soft-state Eventually-consistent properties.NoSQL databases basically use these models.

What do you mean by ACID?

ACID stands for

Atomicity: This means either your transaction can fail or commit

Consistency: Its definition changes from software to software or an application to application, but its general meaning is that data has to stay consistent.

Isolation: Data has to be isolated and separated from each other

Durability: It assures you that once the database receives data, it should ensure that the data is processed. So it is an advantage if the database fails, then the data will not be lost.

What is Anti-Entropy and How is it associated with Merkel Tree?

Anti-entropy is the replica synchronization mechanism, ensuring that data on different nodes is updated to the newest version
Cassandra uses Merkle tree for anti-entropy repair. A Merkel Tree is a hash tree where leaves are hashes of the values of individual keys.

What is the use of Coordinator Node in Read?

Read Operation is easy because clients can connect to any node in the cluster to perform reads. If a client connects to a node that doesn’t have the data it’s trying to read, the node it’s connected to will act as the coordinator node.

What do you mean by Snitch? Name a few

A snitch determines which datacenters and racks, nodes belong to. They inform Cassandra about the network topology and allows Cassandra to distribute replicas specifically, the Replication strategy places the replicas based on the information provided by the new snitch.

There are many types of snitches, to name a few:

* Dynamic snitching
* SimpleSnitch
* RackInferringSnitch
* Ec2Snitch
* PropertyFileSnitch
* GossipingPropertyFile
* Ec2MultiRegionSnitch
* GoogleCloudSnitch
* CloudstackSnitch

What are the different types of Partitioners in Cassandra? Explain.

>> Murmur3Partitioner is the default partitioner. It is both improved and faster than RandomPartitioner. Uniformly distributes data based on MurmurHash function.

64-bit hash value partition key with Range: 263 to 263-1

>> RandomPartitioner was the default partitioner prior to Cassandra 1.2. It is used with vnodes. It has a Uniform Distribution.

It uses MD5 hash values with Range: 0 to 2127-1

>> ByteOrderedPartioner is used for ordered partitioning. It orders rows lexically by key bytes. Using the ordered partitioner allows ordered scans by primary key. This means we can scan rows as though we were moving a cursor through a traditional index.

how Cassandra delete Data?

SSTables are immutable and cannot remove a row from SSTables. When a row needs to be deleted, Cassandra assigns the column value with a special value called Tombstone. When the data is read, the Tombstone value is considered as deleted.

Explain how Cassandra writes changed data into commitlog?

* Cassandra concatenate changed data to commitlog
* Commitlog acts as a crash recovery log for data
* Until the changed data is concatenated to commitlog write operation will be never considered successful

what is Bloom Filter is used for in Cassandra?

A bloom filter is a space efficient data structure that is used to test whether an element is a member of a set. In other words, it is used to determine whether an SSTable has data for a particular row. In Cassandra it is used to save IO when performing a KEY LOOKUP.

Explain what is Memtable in Cassandra?

* Cassandra writes the data to a in memory structure known as Memtable
* It is an in-memory cache with content stored as key/column
* By key Memtable data are sorted
* There is a separate Memtable for each ColumnFamily, and it retrieves column data from the key

What are partitions and Tokens in Cassandra?

* Partition: It is a hash function located on each node which hashes tokens from designated values in rows being added. It converts a variable length input to a fixed length value.

* Token: Integer value generated by a hashing algorithm, identifying a partition’s location within a cluster

How does gossip Protocol help in Failure Detection?

The process of Acknowledging messages helps in failure detection. When a node is down/failing it is unable to send or receive messages and hence the Acknowledgements are not received.

What are the general operations of Cassandra CQL?

There are two types of operations carried by Cassandra:

* Read operation and
* Write operation

How does Cassandra store data?

The data storage path in Cassandra begins with the memtable where the data is stored temporarily and is also called a commit log. And once committed, the data is periodically flushed and written into SSTable

How is data distribution done?

Cassandra database is a highly available database, and it stores data by evenly dividing the data around its nodes. For this, it uses the Murmur3 partitioning function to distribute given data in nodes evenly.

What is the difference between memtable and SSTable?

In MemTable it doesn't store the data. It temporarily accumulates ‘write data’, but it cannot store it into the disk.

Whereas in SStable, it is used to store the data from Memtable into the Cassandra database. The data stored in SSTable is permanent and cannot be changed.

Search
R4R Team
R4R provides Cassandra Freshers questions and answers (Cassandra Interview Questions and Answers) .The questions on R4R.in website is done by expert team! Mock Tests and Practice Papers for prepare yourself.. Mock Tests, Practice Papers,Cassandra Interview Questions for Experienced,Cassandra Freshers & Experienced Interview Questions and Answers,Cassandra Objetive choice questions and answers,Cassandra Multiple choice questions and answers,Cassandra objective, Cassandra questions , Cassandra answers,Cassandra MCQs questions and answers Java, C ,C++, ASP, ASP.net C# ,Struts ,Questions & Answer, Struts2, Ajax, Hibernate, Swing ,JSP , Servlet, J2EE ,Core Java ,Stping, VC++, HTML, DHTML, JAVASCRIPT, VB ,CSS, interview ,questions, and answers, for,experienced, and fresher R4r provides Python,General knowledge(GK),Computer,PHP,SQL,Java,JSP,Android,CSS,Hibernate,Servlets,Spring etc Interview tips for Freshers and Experienced for Cassandra fresher interview questions ,Cassandra Experienced interview questions,Cassandra fresher interview questions and answers ,Cassandra Experienced interview questions and answers,tricky Cassandra queries for interview pdf,complex Cassandra for practice with answers,Cassandra for practice with answers You can search job and get offer latters by studing r4r.in .learn in easy ways .