Data Ingest Framework
Our client operates a large Hadoop cluster supporting a range of use cases across its business units supporting analytic applications, operational reporting and advanced data science initiatives. In order for business users to take full advantage of the platform, they needed to streamline the ingest process to acquire more data from source systems. They had been leveraging a traditional batch-orientated framework using custom code, but wanted to move to a general ingest framework supporting both batch and streaming use cases.
Enable Data provided a team of data engineers to collaborate with the client on a new ingest framework. Based on business and technical requirements, the team designed a Spark-based framework. With approval from the client’s architecture team, our work shifted to developing the framework which incorporated common ingest templates and configuration files to reduce the complexity when adding new data sets. In addition, the team built a set of common ingest functions that could be executed against source data sets as they are processing and written to the client’s enterprise data lake.
We delivered the framework based on the client business and technical requirements and assisted the client with deployment on their Hadoop clusters. We are now assisting the client with migrating their existing ingest jobs to this new framework which had resulted in a more reliable and faster ingest capability.