Rework of a BI solution for a Business Cloud provider
The case describes how Forrards helped a leading business cloud vendor who provides hosted mail, file sharing, VOIP solutions to improve business intelligence and put into order the abandoned ETL which was not properly documented and supported, and that jeopardized overall performance of the business.
Client
Our client is a leading provider of business cloud for small- and mid-size businesses. The company provides hosted email, VOIP, teleconferencing and accompanying cloud solutions.
Objective
The client asked us to put the existing ETL and BI processes into order for two main reasons:
- Wrong business data. The business periodically but unsystematically received wrong figures here and there without any clear understanding where those numbers came from. The overall business intelligence as well as the ETL dataflow were altogether too complex for the in-house team to find the source of the problem.
- No one knew how it worked. Vague development history of the existing solution coupled with the lack of documentation and irrational inner structure of the data warehouse made it nearly impossible for the customer’s team to understand the system not to mention fix it.
As a result, the system operated with errors which prevented the company from performing deep analysis of various business data: marketing figures, sales, average spent, client segmentation information etc.
Challenge
The analysis of the existing solution displayed the following main challenges.
The principal difficulty, of course, was that nobody in the team knew how the system works. For a business, this, of course, is nonsense. Without reliable business intelligence, the client could have issues directing further strategy of the company, or revealing sources of gains and losses, or analyze customer behavior and so on. Blind wandering around is apparently not the best way to do business.
Then, supporting and fixing the data warehouse and ETL system was also cumbersome. This in turn meant that errors tended to multiply and overlap bringing in even more chaos to BI.
Also, the data structure was mazy. For example, a single table often received data from multiple sources without proper processing and aggregation opening a possibility for data to overlap. Multilevel calls of embedded procedures and functions made debugging and tracking the source of wrong numbers nearly impossible. Overall, this leaded to extremely low performance of the solution, and the entire ETL was prone to errors.
Finally, the business mainly used PowerPivot calculations on data exported from SSRS reports into Excel worksheets. Needless to say, such a medium complicated the already non-intuitive analysis process even further.
Initially the solution was built with intention of having a universal BI system which could be rolled out for any client which as we can see from the facts mentioned above was a complete utopia.
Solution
Preliminary analysis
We started from a careful analysis of the existing infrastructure of the client company. Turned out the server systems were fully based on Microsoft technologies. Specifically, MSSQL database was used for the warehouse, SSIS packages for ETL, and SSRS for the reports.
The analysis also revealed the technical problems mentioned in the previous section.
Discussion with the client
We appointed a series of meetings with the business stakeholders from the client side and discussed business goals the client wants to pursue, and reviewed a few cases to prepare vision of a solution that would best match client’s needs and business conditions.
Deconstruction of the old ETL
Before we could offer a new ETL solution to the client, we needed to perform a reverse engineering of the old ETL to find out how it functions and discover the flaws and problems of the existing solution.
The task itself wasn’t easy. No one in the company had any idea about how the system works, remember? So we had to use our custom tool to search the SSRS code, discover SQL dependencies, and emulate embedded packages calls to unweave the internal logic of the client’s ETL.
The gathered data were accumulated in the client’s wiki for further reference.
Offered new tools for data analysis
Provided the exporting of data seriously hindered business intelligence processes in the client’s company, we decided to offer a simple and more efficient alternative. Specifically, we replaced the majority of SSRS reports with a combination of Excel spreadsheets and Power BI both connected to the OLAP Cube, and left only a few SSRS reports which were intended to produce printed documents.
SSRS, a great reporting instrument, is a wrong choice for data analysis. Not only is it significantly slower than a specialized OLAP and PowerBI tools, but it also lacks certain functions that made business analysis much more powerful thanks to arbitrary sections and dependencies.
Also, getting rid of an additional medium allowed the client to work with the data directly. Which means receiving reliable comprehensive marketing and sales data faster and without errors for faster decision making and better understanding of ongoing processes. Plus, direct data analysis was a step forward towards self-service BI.
Developing the new ETL
We mapped the structure of business intelligence in the company and matched it with the goals of the client’s business. So, we could develop a new warehouse schema and an ETL schema.
For higher reliability of the new ETL system, we deployed independent environments for different stages of the software development process: development, QA, UAT, production. Independent environments allowed our client to avoid potential problems, for instance, when a single error in the testing environment ruins the production.
For the new solution we conducted thorough comparison of the performance and correctness of operation to ensure the data from various sources are processed without error under any conditions.
Moving to the new system
Finally, we transferred all business processes to the new data warehouse and ETL. This required additional steps to automate some business-specific corrections while transferring the information.
After that, we tested the efficiency of the new ETL and accuracy of the produced numbers. Turned out, the new system delivers the same numbers as the old one with the exception for wrong figures the old system sometimes produced. Simply put: the new ETL worked better.
Two systems, the old one and the new one, worked in parallel for some time to double-check that all numbers remained correct or had differences where the client expected them. Then, the old system was removed. Also, we moved some SSRS reports to the new data warehouse.
Some final steps
Implementation of the Continuous Integration ideology provided, among other things, a background for automatic testing. This allowed for smooth functioning and effortless support of the new BI system. Specifically, our client was now able to deliver new code several times a day and avoid possible issues that could result in unavailability of reports, broken data or incorrect output figures.
By using CI/CD, we also provided a possibility for automatic updates, so the data warehouse remains stable, and there’s always room for updates or changes – propagating ETL to process other sources of data, gather new data in the warehouse, and so on.
Finally, we left full documentation for the solution we developed to make it easy for our customer to support the solution further on. The situation we started with, when nobody knows how the system functions is impossible now.
Results
Our client emphasized the following results after the new ETL was deployed.
- The business intelligence now provides correct numbers. Data discrepancies caused by multiple sources and overlapped tables have gone.
- Better performance of the ETL process. Previously, loading data could easily take one whole night or even longer, so employees often had to wait for it to finish the next morning. Now, it takes minutes and is done several times a day.
- Less effort on support thanks to better code and thorough documentation.
- Possible data loss due to overlapped data flows is prevented.
Overall, the client achieved excellent results: time-saving and efficient BI, reliable and error-proof ETL and, of course, precise reporting and data analysis for future business growth. Importantly, our work on the ETL and BI solution caused the client to review its business processes and revealed some flaws. Those were to be analyzed and discussed and further streamlined into clearer processes. Hence, improving business functioning in the long run.
Utilizing given approach each client get his own database. In such case every client`s data stays isolated from others. It simplifies backing up the data, installing new clients but causes higher cost of maintenance.
MoreSubject recursion is well covered in the literature, but, nevertheless, the problem of output “tree” does not mean the client and SQL Server many baffled. So, put the problem: there is a table with the name and record id field indicating the parent identifier. Immediately fill in this table, some sort of test data:
MoreIn our time of greatest prevalence of databases were relational databases, which are the main targets of the tables and the relationships between them. Tables can solve most problems for data storage and manipulation. But in the real world entity requiring storage is not always presented in a tabular form. One of these very common types of data structures other than the table is a tree structure, where each data element is the parent and the offspring. An example of such a structure may be the structure of state enterprises, which is headed by the director (the root of the tree), his deputies, heads of departments from which are subject to certain deputies, employees of departments, which are subject to the rulers.
MoreSubscribe to our blog
Utilizing given approach each client get his own database. In such case every client`s data stays isolated from others. It simplifies backing up the data, installing new clients but causes higher cost of maintenance.
Subject recursion is well covered in the literature, but, nevertheless, the problem of output “tree” does not mean the client and SQL Server many baffled. So, put the problem: there is a table with the name and record id field indicating the parent identifier. Immediately fill in this table, some sort of test data:
Discuss your project with us