Supercharge your ESG data 

Why automate your ESG data pipeline and how to do it?

While requirements for ESG reporting for businesses are tightening many organizations are still struggling with inefficient manual reporting processes that compromise the quality and assurance-readiness of ESG reporting.  

It is not always easy to find actual data for ESG KPIs – hence manual data input and calculation logic based on e.g. emission factors, averages and standard rules will be reality for some parts of ESG reporting also in the near future.  

Based on our experience, organizations can improve their reporting process significantly by gradually automating ESG data pipelines wherever possible – this brings immediate benefits by improving the efficiency of the reporting process as well as allowing better accuracy of your ESG reports and transparency into underlying data. 
 
At Etlia Data Engineering we have successfully implemented automated ESG data pipelines for our clients and in this blog, we dissect our key learning points based on our experiences. 

Why consider automating your ESG data pipeline? 

Main benefits our customers have achieved by automating their ESG data pipeline: 

  • Transparency and assurance-readiness: Automating data pipeline from operative systems helps ensure ESG reports comply with regulatory requirements and provide audit trails for accountability and transparency. 
  • Cost optimization: Reducing the need for manual entry of ESG data, for example using Excel files lowers labor costs and minimizes the cost impact of errors and delays. 
  • More up-to-date ESG reports: Automation significantly reduces the time required to gather, process, and update data, enabling real-time or near-real-time reports allowing management to take action faster than with manual process. 
  • Superior data quality: Automated ESG data pipeline is remarkably less error-prone compared to manual processes.  
  • Scalability: An automated ESG data pipeline can scale-up and handle increasing volumes of data as the company grows, unlike manual processes that struggle to scale efficiently. 

What are the biggest challenges? 

The most common hurdles our clients are facing when building ESG data solutions: 

  1. Inaccuracy and lack of transparency: In the worst-case manual data processes and calculations will cause your ESG reporting assurance to fail solution: Try to automate your ESG data pipeline whenever possible in order to ensure transparency and audit trails.  
  1. Complexity of data: ESG data is usually stored in business process solutions that have been optimized for running daily operations instead of ESG reporting ➤ solution: find skilled enough partners who can help design, model and implement data architecture for ESG reporting.  
  1. Internal data gaps: It is often difficult to find all the data needed e.g. for preparing a comprehensive emissions calculation ➤ solution: use designated ESG specific solutions or approved industry practices to complement your calculation process.  
  1. Dependency on data provided by suppliers: Usually you need to get some data from your suppliers and often this becomes an issue when preparing ESG reporting ➤ solution: try to get the necessary data from your suppliers if possible. Sometimes a more viable solution is to use industry standard calculation rules or data ecosystems in order to fill in the gaps.  
  1. Knowledge issues: internal politics and siloes can hinder finding an optimal solution if the stakeholders do not have needed understanding of the ESG requirements or interlinked data architectures ➤ solution: make sure to train your internal experts and to take care of internal knowledge sharing.  
  1. ESG reporting solution not aligned with overall data strategy and architecture: This can happen for example in case the team in charge of ESG reporting is building their own solutions in isolation ➤ solution: tight coordination between ESG organization and business IT data solution owners/architects.  

How to do it? 

These are our recommended steps to automate your ESG data pipeline 

  • Get started: The sooner you start building automated data flow from operative systems the better it will be for managing the overall roadmap, as it will take time and substantial investments. It is best to get started and move away from manual processes gradually. 
  • Build your understanding: Understanding of the KPIs and ESG reporting requirements such as EU CSRD is crucial, as they help to define the data needed to build the ESG pipeline.  
  • Define targets: Define stakeholders’ targets and roadmap for your ESG reporting development.  
  • Assess your data and data sources: First, define the data you can get from internal sources and whether there is a need for external data. A good example in the case of the process industry could be that you need material information from suppliers and external data for the coefficient from other providers. The exercise of understanding source data and systems helps to determine if you could stay with existing data architecture or do you need a new one to support the ESG pipeline. 
  • Select technologies: Choosing the right platform for your ESG data is crucial considering the maintainability and complexity of data sources. You may be attracted to use tools that have fancy pre-defined templates but be aware, 1) this does not remove the need for having a proper data platform and 2) these tools might have other limitations such as very specific requirements for overall architecture that could be in conflict with your organization’s guidelines. 
  • Data modelling: Start with an analysis identifying how much data is available to build your ESG pipeline. Data modeling for ESG will require combining the data from your systems with reference data (for common data and coefficients) to calculate your emissions and other KPIs. You should expect the model could probably contain hierarchical traversing to calculate the emissions on all granularities to identify which is the major contributor, and this could also be a decider in choosing your architecture. 
  • Solution development: Ideally the development process should follow your organization’s common process for building data solutions. At Etlia Data Engineering we always recommend agile development methodologies.  
  • Gradual development: Start Small. Due to the complex nature and limited availability of the data it’s a good approach to proceed modularly and build your solution step by step automating one part of the data flow at a time.  

– Raaju Srinivasa Raghavan & Mikko Koljonen 

Are you ready for ESG data automation? If you have any questions or need support in your ESG data process don’t hesitate to reach out to us by booking a short meeting!

10 tips on how to make your data assets business-AI-ready

Along with the current emergence of AI there is also a lot of excitement about “Business AI” or alternatively “Enterprise AI”. Although there is no single definition of Business AI, it can be seen as business processes and decision making supported by various AI tools often embedded into enterprise software products.

While generative AI solutions like GPT and various “co-pilot”-types of AI assistants are very usable for some use cases we are still some steps away from fact-based AI-supported company or business unit-wide decision making that relies on hard quantitative business data. Currently, the focus of business AI use case development is mainly on creating new types of user interfaces and supporting specific business process workflows where the new generative AI models have a competitive advantage. But when asking your internal AI assistant to provide you with a report on company KPI’s you have a substantial risk of getting wrong results, unless your underlying data is reliable. Quantitative data is still often leveraged by the conventional ML algorithms and some organizations are championing this very well – some have been doing this for a few decades already!

In the current buzz it is easy to forget that one of the biggest challenges is that you cannot fully rely on generic generative AI models to answer factual questions correctly in a business context. Leading software companies, such as Microsoft, Salesforce and SAP, are currently pouring their resources into Business AI solutions designed to take your business to new heights. While AI assistants and automated workflows are useful tools, running a business successfully demands a thorough understanding of business logic and trust in underlying numbers. It is easy to forget that business AI needs data. So how to make your analytics data assets ready for business AI? Let’s find out!

More than ever the key question is the quality of the data. You do not want to have a Business AI solution that uses wrong data as a basis for the desired outcome.

The only way to build working business AI solutions is to enhance your models based on CORRECT business data. How to achieve that? Where to get that correct business data? Answer is simple – you need to start by taking care of the impeccable data flow in your data pipelines. Unless the correct data is available for the AI models you will be in trouble.

High-quality data is a daydream for anyone dealing with massive corporate business data solutions, often struggling with data integrity. An optimist might say that Business AI is pushing us to a new era where we will finally have the single version of the truth.

Here is my take on the top 10 activities that everyone should be doing today to make their data assets and organization ready for business AI:

  1. Get started: cultivate an AI mindset and understanding by training people and start to use available AI tools such as AI-assistants
  2. Assess and understand your current data and systems
  3. Set your ambition level and goals based on business strategy and targets
  4. Invest in skills: own and external
  5. Plan your roadmap and high-level data architecture based on your ambition level and possible use cases
  6. Ensure adequate data governance within your organization
  7. Select technologies that suit your overall IT systems landscape
  8. Design your detailed data architecture and solutions properly to avoid surprises
  9. Build a sustainable and modern data architecture to allow impeccable flow of data from source to your business AI solution
  10. Don’t forget: continuous housekeeping and incremental development based on your roadmap

As a business or IT leader you surely want to get started today to stay in the game and ensure your data architecture drives your organization’s future success. Make sure your data assets are ready for business AI solutions, and follow our step-by-step tips!

Etlia is a fast-growing and focused data engineering company specializing in business data. If you are interested in learning how to build your data pipelines business AI ready don’t hesitate to get in touch by booking a meeting with us.

Book a meeting or contact us!

Mikko Koljonen

Testing MS Fabric  Review on ”Auto-create report” -feature

One of our experts had previously produced a meaningful report of Finland’s Corona data. Now, after the launch of Microsoft’s new SaaS offering called Fabric, we will test its reporting feature which is supposed to ease the work of Data analysts and BI developers. With the “Auto-create report” -feature embedded to MS Fabric you can create insights from datasets with just one click. In the following text, we are going to compare the reports built by Fabric and our expert, and review if the quality of auto-created report matches the one produced by an expert. 

Fabric’s auto-created report of Finland’s Corona data 

How does it work? 

From Fabric UI you can conveniently access your data. You are able to create datasets from the files and tables you have uploaded to OneLake, which is this new unified data source we discussed in the previous blog concerning MS Fabric. By selecting the dataset you would like to create the report of, you can decide whether you like to build the report from scratch or if you want the Fabric to automatically build the report. 

When you decide to automatically create the report, Fabric picks up the columns from the tables it thinks are the most meaningful and creates the visuals to reflect the insights of that data. It creates a quick summary page to show the most important highlights on its opinion. It also writes a short text to summarize the insights of the visuals. You can then change the data you want to be projected and it automatically builds new visuals of the selected data. 

Comparison 

Using the “Auto-create report” -feature you can easily build a sufficient report which tells effectively the key insights of the data. You probably still need to do some work selecting the right data to be projected, because it doesn’t necessarily pick up the right columns right away. The report it creates, may be good enough, if you just need to quickly check what is happening. However, the report it creates, isn’t visually as exquisite or informative as one created by an expert. Also, it only offers a quick summary of the data, whereas a human can create multi page report offering deep understanding of the matter. You can also change the type of the visualizations in the automatically created report, but it is as simple to build the report from scratch. If you want to build a presentable report with the help of “auto-create report” -feature, you have to put as much thought and effort on it as if you were to build the whole thing from scratch. 

In conclusion, we think that this feature is nice add to Power BI, because anyone can easily check the insights that the data has to offer and make decisions based on that information. Anyway, if you want to create a report that offers powerful support for your presentation, you still need to use some time on building the report and empathizing the major data points. 

Future of AI Analytics 

Even though the quality of the automatically created report isn’t yet quite as insightful as the report created by human expert, it still is impressive, how well it can connect different types of data and produce meaningful visuals all by itself. AI and machine learning technologies have been rapidly evolving in recent years and Data Analytics offers great usage for those. They are already great at identyfying patterns and analyzing the relationships and dependencies between variables. Therefore, we believe that there is still room for this “auto-create report” -feature to improve. In the future, it might be able to interpret and communicate the information hidden in the data even better than the brightest expert. 

At the moment, the trend seems to be that we are trying to advantage AI by using generative AI language models as a trusted helper that will do the hand-on work for us. Microsoft has informed us about the copilot feature which will be included in the Fabric offering but isn’t available yet in the public preview version. They have showed us how you’ll be able chat and tell what information you want to know from the data. It can create measures and SQL views. Of course, it can create visuals, but it can answer more sophisticated questions too. For example, it can show you with visuals the reasons why something has happened or give suggestions via chat on how you could improve certain values. With the copilot, the only thing left for humans to do, is to know what to ask. Often those questions repeat themselves so maybe we might be able to automate also that task someday. 

.