10 tips on how to make your data assets business-AI-ready

Along with the current emergence of AI there is also a lot of excitement about “Business AI” or alternatively “Enterprise AI”. Although there is no single definition of Business AI, it can be seen as business processes and decision making supported by various AI tools often embedded into enterprise software products.

While generative AI solutions like GPT and various “co-pilot”-types of AI assistants are very usable for some use cases we are still some steps away from fact-based AI-supported company or business unit-wide decision making that relies on hard quantitative business data. Currently, the focus of business AI use case development is mainly on creating new types of user interfaces and supporting specific business process workflows where the new generative AI models have a competitive advantage. But when asking your internal AI assistant to provide you with a report on company KPI’s you have a substantial risk of getting wrong results, unless your underlying data is reliable. Quantitative data is still often leveraged by the conventional ML algorithms and some organizations are championing this very well – some have been doing this for a few decades already!

In the current buzz it is easy to forget that one of the biggest challenges is that you cannot fully rely on generic generative AI models to answer factual questions correctly in a business context. Leading software companies, such as Microsoft, Salesforce and SAP, are currently pouring their resources into Business AI solutions designed to take your business to new heights. While AI assistants and automated workflows are useful tools, running a business successfully demands a thorough understanding of business logic and trust in underlying numbers. It is easy to forget that business AI needs data. So how to make your analytics data assets ready for business AI? Let’s find out!

More than ever the key question is the quality of the data. You do not want to have a Business AI solution that uses wrong data as a basis for the desired outcome.

The only way to build working business AI solutions is to enhance your models based on CORRECT business data. How to achieve that? Where to get that correct business data? Answer is simple – you need to start by taking care of the impeccable data flow in your data pipelines. Unless the correct data is available for the AI models you will be in trouble.

High-quality data is a daydream for anyone dealing with massive corporate business data solutions, often struggling with data integrity. An optimist might say that Business AI is pushing us to a new era where we will finally have the single version of the truth.

Here is my take on the top 10 activities that everyone should be doing today to make their data assets and organization ready for business AI:

  1. Get started: cultivate an AI mindset and understanding by training people and start to use available AI tools such as AI-assistants
  2. Assess and understand your current data and systems
  3. Set your ambition level and goals based on business strategy and targets
  4. Invest in skills: own and external
  5. Plan your roadmap and high-level data architecture based on your ambition level and possible use cases
  6. Ensure adequate data governance within your organization
  7. Select technologies that suit your overall IT systems landscape
  8. Design your detailed data architecture and solutions properly to avoid surprises
  9. Build a sustainable and modern data architecture to allow impeccable flow of data from source to your business AI solution
  10. Don’t forget: continuous housekeeping and incremental development based on your roadmap

As a business or IT leader you surely want to get started today to stay in the game and ensure your data architecture drives your organization’s future success. Make sure your data assets are ready for business AI solutions, and follow our step-by-step tips!

Etlia is a fast-growing and focused data engineering company specializing in business data. If you are interested in learning how to build your data pipelines business AI ready don’t hesitate to get in touch by booking a meeting with us.

Book a meeting or contact us!

Mikko Koljonen

AI in data engineering – hands-on experiences

Written by Shubham Keshri

As a fellow data engineer, I understand how tedious and time-consuming it can be to perform repetitive tasks. That’s why I’m excited to share some AI-based tips and tricks that can help you streamline your workflow and increase your productivity.

One tool that I highly recommend is Bing Chat GPT. It is an AI-powered chatbot which can help you with a wide range of tasks, from converting units to summarizing long articles. It’s like having a personal assistant at your fingertips!

Another tool that can help you save time is GitHub Copilot. This AI-powered tool is designed to help developers write code faster and more efficiently. It uses machine learning to suggest code snippets and auto-completes repetitive tasks, such as creating tables or copying files from one location to another.

Using AI with Azure Synapse Analytics

In one of the customer assignments, we used Azure Synapse Analytics to build some pipelines (we’re plumbers :D). However, as you may already know, Azure Synapse doesn’t allow you to write code directly on the IDE. Instead, you must use the portal.

You had to copy the code from a notebook and paste it into Bing AI. It’s like trying to play a game of chess with one hand tied behind your back! That’s why we use this method only for doing some migration. It is not perfect but sometimes gets the job done.

Copy-pasting wasn’t fun! But perhaps there was someone listening: with the recent update to GitHub Copilot with Visual Studio and Visual Studio code, you can now use the inbuilt chat feature to perform the same tasks without having to switch between different applications. This can save you a lot of time and make your workflow more efficient.

Using AI with Azure Synapse notebooks

Now let’s dive into some specific examples of how these tools can be used in conjunction with Azure Synapse notebooks.

If you’re working with Py Spark or Spark SQL in Synapse notebooks, you know how tedious it can be to write code for repetitive tasks like creating tables or copying files from one location to another. But with GitHub Copilot, you can easily auto-complete these tasks with just a few keystrokes.

For example, let’s say you want to create a new table in Synapse Analytics using PySpark. Normally, this would require several lines of code. But with GitHub Copilot, all you have to do is type “create table” followed by the name of your table and the data type for each column. GitHub Copilot will then generate the entire Py Spark code for you!

Similarly, suppose you want to copy data lake files from one location to another in Synapse Analytics using Spark SQL. In that case, all you have to do is type “copy data lake files” followed by the source and destination paths. GitHub Copilot will then generate the entire Spark SQL code for you!

These are just a few examples of how Bing Chat GPT and GitHub Copilot can be used with Azure Synapse notebooks to increase your productivity as a data engineer. By automating repetitive tasks and streamlining your workflow, you’ll be able to focus on what really matters: automating workflows, analyzing data and generating insights.

If you have any questions or comments reach out to us. And remember, always keep calm and code on!

P.S. Did you notice, that this blog post was written with the help of AI?

Contact us to learn more

.