>> SORT BY â€“ Data is ordered at each of â€˜Nâ€™ reducers where the reducers can have overlapping range of data.
>> ORDER BY- This is similar to the ORDER BY in SQL where total ordering of data takes place by passing it to a single reducer.
>> DISTRUBUTE BY â€“ It is used to distribute the rows among the reducers. Rows that have the same distribute by columns will go to the same reducer.
>> CLUSTER BY- It is a combination of DISTRIBUTE BY and SORT BY where each of the N reducers gets non overlapping range of data which is then sorted by those ranges at the respective reducers.
Posted Date:- 2021-10-22 08:58:58