i) Data Ingestion â€“ The foremost step in deploying big data solutions is to extract data from different sources which could be an Enterprise Resource Planning System like SAP, any CRM like Salesforce or Siebel , RDBMS like MySQL or Oracle, or could be the log files, flat files, documents, images, social media feeds. This data needs to be stored in HDFS. Data can either be ingested through batch jobs that run every 15 minutes, once every night and so on or through streaming in real-time from 100 ms to 120 seconds.
ii) Data Storage â€“ The subsequent step after ingesting data is to store it either in HDFS or NoSQL database like HBase. HBase storage works well for random read/write access whereas HDFS is optimized for sequential access.
iii) Data Processing â€“ The ultimate step is to process the data using one of the processing frameworks like mapreduce, spark, pig, hive, etc.
Posted Date:- 2021-08-31 05:21:54