Databricks and Hive
Enable Data a leading analytics and engineering consultancy, recently completed a successful Databricks proof-of-concept (POC) for a billion-dollar market research firm based in Chicago, Illinois. The objective was to demonstrate:
- Migration of complex Hive workloads from internal Hadoop systems to Databricks.
- Process customer data sets in a shorter time frame.
- Lower infrastructure hosting costs.
Our consultants worked with the customer to re-factor and optimize select job functions which were then deployed to a three (3) node Azure Databricks cluster. Our benchmarks resulted in decreased run-times from 5 ½ hours to about 20 minutes for the tested code. The estimate costs savings and job performance improvements represent a substantial benefit to the business unit and is being used to support a recommendation to migrate the entire product to Databricks.