Do you ever find yourself trapped in the tedious world of manual reporting on Excel? We've been there too, and we know how time-consuming and soul-draining it can be. But fear not, we embarked on an exciting journey to liberate ourselves from the shackles of manual reporting and discovered a realm of real-time online reporting that has revolutionized our approach!
In this blog post, we'll take you on a thrilling ride through our transformation from spreadsheet slaves to data-driven superheroes. We'll spill the beans on the tech stack that catapulted us into a world of efficiency and productivity. Get ready to discover the cutting-edge solutions we handpicked, and more importantly, the reasons behind our choices and the incredible value they brought to the table.
But here's the best part - you don't need to be a tech whiz or rely on engineers and developers to implement these changes. We'll reveal the secrets of seamless integration, making this a hassle-free journey for you and your team.
So buckle up and get ready to revolutionize your reporting process. This blog post is your gateway to a more data-driven organization, where informed decisions flow effortlessly. Say goodbye to the mundane and embrace the power of real-time online reporting - your key to unleashing the full potential of your data!
#1 Understanding Smeetz's Initial Situation and the Motivation Behind Project Implementation
At the outset, Smeetz faced a common challenge shared by many startups and established companies alike. The company utilized various software solutions to manage customer relationships, billing, and other business-related data. However, relying on these individual data sources independently did not provide a comprehensive view of the business. To gain accurate insights into the overall situation, it was necessary to consolidate data from different sources.
While each software employed its internal reporting engine, the difficulty lay in combining data from various platforms seamlessly. To address this issue, many startups resorted to manual aggregation using spreadsheets like Excel or Google Sheets. Nonetheless, this approach had its drawbacks. It required a significant amount of laborious data entry, and some granularity was lost as not all relevant data could be integrated into the spreadsheets.
For example, revenue data from billing subscriptions might be handled by one software, while revenue from core product transactions could be managed by another. To analyze the best acquisition channel for realized revenue, it became essential to connect the realized revenue data with CRM information to identify the customers' sources accurately.
As Smeetz underwent rapid growth, expanding from around 15-20 employees to 50 and establishing its market presence in more than four countries, the demand for timely and complex reporting data intensified. Making well-informed business decisions became increasingly crucial to sustain growth and stay competitive.
Moreover, various stakeholders, including team members and investors, required access to real-time reporting data to effectively steer marketing campaigns, sales strategies, and other key initiatives. The need for quicker and more accessible insights prompted a comprehensive overhaul of the reporting approach.
In response to these challenges, Smeetz embarked on a large-scale project with four primary objectives:
- Centralize all data sources to eliminate fragmentation and streamline the reporting process.
- Integrate and combine data from disparate platforms to gain a holistic and accurate overview of the business.
- Calculate essential key metrics that are globally relevant and easily accessible to stakeholders.
- Establish an online platform for effortless and standardized consumption of reporting data.
By undertaking this ambitious project, Smeetz aimed to empower its teams with actionable insights and enable faster, data-driven decision-making to drive the business forward successfully.
#2 Implementing a Data-Driven Project: Overcoming Challenges and Achievements
#2.1 Consolidating Business Data: Streamlining Data Extraction and Centralization
In the pursuit of efficiently managing our business data, we embarked on a crucial first step - extracting data from various software applications and centralizing it into a unified location. To accomplish this goal, we engaged in extensive research, exploring different data extractors and data warehouse options to find the best fit for our needs.
Data Extractors Exploration
Our search for suitable data extractors proved challenging, as most options were only compatible with some of the software we utilized, preventing us from achieving our vision of a seamless data stack. However, our quest ended with the discovery of Fivetran - an automated data movement platform that performs data extraction, loading, and transformation from diverse sources into cloud data warehouses.
Fivetran offered numerous advantages, making it an ideal choice for our data integration needs. Firstly, it excelled in user-friendliness, requiring no coding for pre-made connectors, which covered a wide array of software used in the startup ecosystem, such as Google Sheet, MySql, HubSpot, ChargeBee, Google Analytics, Google Ads, Linkedin Ads, and more. Secondly, Fivetran's flexibility allowed us to set up custom connectors through Google Cloud functions, catering to software not already listed in their connectors.
To facilitate the integration of APIs not initially supported by Fivetran, we employed Google Cloud functions. This serverless computing platform empowered us to run code without the burden of server management. For instance, we utilized this capability to maintain daily records of exchange rates for various currencies against our base/reference currency, vital for our global business operations.
Data Warehouse Selection
Once we settled on Fivetran as our data extractor, the decision of which data warehouse to employ became the next crucial step. Fortunately, our options were more limited in this regard. Fivetran offered integration with multiple destinations, including Amazon S3 & Redshift, Microsoft Azure Data Lake Storage, Google BigQuery, MySQL databases, among others.
Considering that the project was initiated by our business teams rather than our tech teams, we sought a data warehouse that provided visualization and manipulation capabilities without requiring server setup or complex frameworks. Consequently, we opted for Google BigQuery, impressed by its user-friendly interface and its ability to handle vast volumes of data with exceptional speed, bolstered by advanced SQL functions.
As users of Google G-Suite, the seamless integration between Google BigQuery and Google Sheets proved to be a significant advantage. This integration allowed us not only to import business data into our chosen data visualization solution but also to access it conveniently within any Google Sheets documents used for our business purposes.
In conclusion, through the synergistic combination of Fivetran as our data extractor and Google BigQuery as our data warehouse, we have successfully streamlined our data extraction and centralization processes. This foundation ensures a more efficient and organized approach to managing our business data, empowering us to make informed decisions and drive our business forward with greater clarity and precision.
#2.2 Consolidating Data: Overcoming Challenges and Embracing DBT Cloud
After successfully implementing data extractors with the help of Fivetran and centralizing all business data in our Google BigQuery data warehouse, a new challenge emerged. We needed a solution to connect and link data from various sources seamlessly.
Our goal was to integrate CRM data from HubSpot with revenue data from Smeetz and Chargebee, along with marketing data from Google Ads and LinkedIn Ads, not forgetting accounting data from Bexio.
Initially, we considered joining all these datasets using a data visualization solution or BI engine like Looker Studio or Databox. However, we quickly realized this approach had limitations. The complexity of data joins, coupled with the lack of reusability outside the visualization tool, and the inability to perform advanced calculations such as CAC and LTV, made this option unfeasible.
Thus, we set out to find a new software solution that could efficiently combine all our business data and facilitate advanced automated calculations. Enter DBT Cloud, a development framework that blends modular SQL with software engineering best practices, ensuring reliable, fast, and enjoyable data transformation processes. With the power of Jinja syntax, DBT Cloud enabled us to define clean and robust SQL code, reducing redundancy and increasing efficiency.
DBT Cloud allowed us to create post-processing data models from raw data coming in from different sources. These models were then used to generate cleaned, streamlined, and joined data, ready for consumption by our data visualization solution and other Google Suite applications. By utilizing DBT Cloud, we were able to streamline our data workflow, ensuring a more integrated and seamless data analysis experience.
#2.3 Streamlining Global Key Metric Calculations using DBT Cloud
Upon discovering the immense capabilities of DBT Cloud, we realized the vast potential it offered for calculating our global key metrics. The power to script data transformations using DBT allowed us to generate business-ready metrics effortlessly for our teams.
To take full advantage of these capabilities, we embarked on an ambitious endeavor to develop a collection of SQL macros/functions that harnessed Jinja syntax. This strategic move aimed to eliminate the need for manual data transformation, providing us with a streamlined abstraction of calculations. With a simple additional line in a CSV seed file, we could produce new metrics as desired.
Our approach involved meticulously organizing the DBT Cloud project through numerous iterations and refactorings, leading us to an optimal architecture that was clean, easily understandable, and ready for future modifications.
Here's how we structured the DBT Cloud project:
Data Acquisition Models
We crafted specific models to extract, clean, and streamline data from various data sources. Each data source had its own dedicated model, enabling us to identify, reorder, and rename key column names as needed.
Data Bundling Models
These models were designed to join datasets obtained from the data acquisition models. The objective was to create fully-complete objects that encompassed data from multiple sources. For instance, we linked customer data from different software, incorporating not only HubSpot or Smeetz but also data from Chargebee, our subscription management software.
Data Metric Calculation Models
This set of models enabled us to perform systematic calculations on the dataset generated by the data bundling models. We could effortlessly compute a wide range of sums, ratios, moving averages, and more by adding a simple line of code to our seeds.
Seeds
We heavily utilized seeds to avoid static hard-coded data in our models. Instead, we described how data should be aggregated, especially when dealing with multiple keys or statuses that required grouping together. Additionally, seeds were instrumental in specifying the metrics we sought and how they should be calculated. As a result, a new line in the CSV could perform the calculation of a brand new metric.
Macros
Macros played a crucial role in our project, albeit with some complexity. They allowed us to minimize SQL code duplication throughout the project. Whenever we identified overlapping sections in the developed models, we abstracted the code into macros, making it reusable in other models.
By following this well-organized approach and leveraging the capabilities of DBT Cloud, we successfully streamlined the calculation of global key metrics, empowering our business teams with valuable insights.
#2.4 Enabling Data Access and Visualizations for Stakeholders
After the challenging task of developing and thoroughly testing the DBT Cloud models, we were nearly ready to proceed. All the critical business data and essential metrics were successfully integrated into Google BigQuery, making them readily available for utilization.
Next, we needed to decide on a data visualization solution or BI engine to present this valuable data to our stakeholders. The choice was straightforward, considering we are avid users of Google G-Suite. Looker Studio emerged as the ideal choice, as it came with an impressive array of functionalities and was available for free.
With Looker Studio, we were able to rapidly construct multiple reports with varying levels of granularity in terms of access rights and information, thanks to Google Groups. The platform also provided cutting-edge displays, enabling us to monitor for instance the GTV (Gross Transaction Value) flowing through our core product, Smeetz, in real-time. We could effortlessly analyze this data breakdown by market and compare it against our forecasts and YoY (Year over Year) data points.
Additionally, in conjunction with Looker Studio, our business teams gained direct access to raw data and metrics from Google Sheets. They could now write simple SQL queries using the native Connect to BigQuery connector of Google Sheets. This seamless integration expanded the possibilities for spreadsheet automation, empowering our teams to derive even more insights from the data.
Overall, these integrations and choices allowed us to make data-driven decisions efficiently and effectively, enhancing our understanding of the business and ultimately driving better outcomes.
#3 Challenges Encountered during Implementation
#3.1 Starting from Scratch without Prior Technical Knowledge
Undertaking this project from scratch without any prior technical knowledge proved to be one of the most daunting challenges. Being more skilled in business with limited coding experience from my banking background (primarily using VBA in Excel), I relied heavily on my curiosity and extensive Google searches.
Despite these limitations, I managed to rapidly acquire new knowledge, gaining a comprehensive understanding of various data stacks and developing proficient SQL scripting skills, which were instrumental in creating DBT Cloud models.
#3.2 Identifying the Right Tech Stack
Selecting the appropriate tech stack presented a lengthy and intricate process due to the array of tools available, each with its own set of pros and cons. Our ultimate confidence in the chosen tech stack stemmed from aligning it with our company's specific requirements. To streamline such projects for others, I recommend dedicating time to outline your organization's objectives and drafting specifications to assess the capabilities of each potential solution.
#3.3 Time Constraints and Project Management
One of the significant challenges we faced was finding sufficient time to carry out this project. Balancing our already busy schedules with tight deadlines made it seem almost unfeasible.
However, we recognized the long-term benefits of the project and deemed it crucial to set aside time for its successful completion. We found it incredibly rewarding to push ourselves for a month, including working nights and weekends, to achieve our objectives.
To navigate such constraints, we suggest creating a detailed project roadmap with intermediate goals and deadlines. Additionally, involving the team early on is vital, as it distributes the project's knowledge base and facilitates long-term maintenance. By breaking down the project into subgoals and involving teammates in specific areas, you can ease the burden and make progress more efficiently.
#3.4 Understanding the Data Structure of the Data Source
Building models in DBT Cloud presented challenges beyond SQL scripting skills, particularly in comprehending the data structure produced by each software we extracted data from. It required considerable effort to map different data sources together, identify relevant information, and eliminate irrelevant data.
In conclusion, overcoming these challenges demanded perseverance, collaboration, and a continuous quest for knowledge. While the journey was demanding, the benefits of successfully completing the project have proven to be highly rewarding for our organization.
#4 What is the outcome of the project
The outcome of the project has been transformative for Smeetz, leading to significant improvements in the reporting process and data-driven decision-making. Here are the key outcomes:
Real-Time Online Reporting
The manual reporting process in Excel has been replaced with a real-time online reporting system. Thanks to the data extractors, Fivetran, and the data warehouse, Google BigQuery, all the business data from various sources are now centralized in one location. This allows for up-to-date insights, eliminating the need for tedious manual data entry and providing instant access to critical information.
Data Consolidation and Granularity
The implementation of DBT Cloud has enabled the consolidation of data from different sources and the ability to piece them together. By connecting CRM data, revenue data, marketing data, and accounting data, Smeetz now has a holistic view of its business, enabling better-informed decision-making. Moreover, the data models built in DBT Cloud have facilitated advanced calculations, allowing for the generation of key metrics like CAC (Customer Acquisition Cost), LTV (Customer Lifetime Value), and more.
Easy Access to Key Metrics
With Looker Studio as the data visualization solution, business teams at Smeetz can easily access and explore the key metrics and reports. The user-friendly interface and powerful functionalities of Looker Studio allow stakeholders to gain insights into the business's performance, track KPIs, and make data-driven decisions quickly.
Time and Resource Savings
The shift from manual reporting to an automated online reporting system has resulted in substantial time and resource savings for Smeetz. By reducing the time spent on maintaining and updating reports, the team can now focus on analyzing the data and extracting valuable insights that drive business growth.
Scalability and Standardization
The new reporting system is scalable, accommodating Smeetz's growth as it expanded its market presence to multiple countries. The standardized data processing and reporting workflows facilitated by the tech stack ensure consistency and reliability in generating reports, which is crucial as the organization continues to scale.
#6 What is next?
With the successful implementation of the automated online reporting system, Smeetz is now in a strong position to further enhance its data-driven decision-making capabilities. Here are some of the potential next steps:
Advanced Analytics and Predictive Modeling
Now that Smeetz has a solid foundation in data management and reporting, the company can explore advanced analytics techniques and predictive modeling. By leveraging the data available in Google BigQuery and the analytical capabilities of Looker Studio, Smeetz can uncover deeper insights, identify trends, and make data-driven predictions about customer behavior and market trends.
Machine Learning Integration
Integrating machine learning into the reporting process can provide even more sophisticated insights and recommendations. By harnessing machine learning algorithms, Smeetz can automate tasks, such as customer segmentation, churn prediction, and demand forecasting, leading to more efficient operations and better customer experiences.
Continuous Improvement and Collaboration
The journey of data-driven decision-making is continuous. Smeetz should foster a culture of continuous improvement, regularly reviewing and refining its reporting system to meet evolving business needs. Collaboration between business teams and technical teams is crucial to ensure that the reporting system remains aligned with business goals and adapts to changing requirements.
By embracing these next steps, Smeetz can continue to leverage the power of data to drive growth, innovation, and success in the competitive landscape of startups and beyond.