Modern Analytical Platform: Access Data Efficiently with Less Code – Part 2

The Business Challenge?

Our business challenge was to create a global “self-service” Modern Analytical Platform for a Fortune 500 company that would enable their Data Scientist teams to efficiently exercise large amounts of data within their Data Lake using Microsoft Power BI. The reporting was needed on a global scale to better understand the nature and need for service calls on an installed base of equipment.

The data for this case study resides in an Azure Data Lake that we had planned, architected and built to host data from Salesforce, SAP, JDE and a wide variety of other ERP systems and custom data platforms. We implemented an Azure Data Lake and a curated reporting platform consisting of Azure Analysis Services and Power BI.

In addition, we needed to be able to provide ad hoc querying of data for data validation and self-service reporting as the Data Lake was difficult for non-technical people to query and the process is time consuming.

Solution: Ad Hoc Query Layer for User Friendly Access to Efficient & Accurate Reporting

The solution was to provide an ad hoc query layer to the architecture. The tool of choice was Azure Databricks. We were able to expose our reporting model tables in the Azure Data Lake, refreshing them daily allowing for users to either query those tables utilizing a more familiar language (SparkSQL) and interface or direct connecting to the files in Microsoft Power BI and doing their filtering and querying through the tool.

Tools:

Results:

Global data is now readily available for reporting through Microsoft Power BI for multiple business entities through a curated dataset, as well as a convenient and familiar query tool and language.

There are reduced data discrepancies and data redundancy based on standardize datasets; instead of other groups creating and storing the same data with potential differing logic.

We lowered development validation and troubleshooting time by 50% -75% based on the ability to query and access dataset more efficiently with less code.

Data Science teams were able to query the same Data Lake with their Python scripts and leverage Databricks Runtime for Machine Learning which contains multiple popular libraries includes TensorFlow, PyTorch, Keras, and XGBoost.

Would you like to hear more about these technologies? Please contact us for a free consultation.