If you have to join two large tables, you can go for reduce side join. But if both the tables have the same number of buckets or same multiples of buckets and also sorted on the same column there is a possibility of SMBMJ in which all the joins take place in the map phase itself by matching the corresponding buckets.
Buckets are basically files that are created inside the HDFS directory.
There are different properties which you need to set for bucket map joins and they are as follows:
set hive.enforce.sortmergebucketmapjoin = false; set hive.auto.convert.sortmerge.join = false; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true;
Posted Date:- 2021-10-22 02:03:04