10 tips on how to make your data assets business-AI-ready

Along with the current emergence of AI there is also a lot of excitement about “Business AI” or alternatively “Enterprise AI”. Although there is no single definition of Business AI, it can be seen as business processes and decision making supported by various AI tools often embedded into enterprise software products.

While generative AI solutions like GPT and various “co-pilot”-types of AI assistants are very usable for some use cases we are still some steps away from fact-based AI-supported company or business unit-wide decision making that relies on hard quantitative business data. Currently, the focus of business AI use case development is mainly on creating new types of user interfaces and supporting specific business process workflows where the new generative AI models have a competitive advantage. But when asking your internal AI assistant to provide you with a report on company KPI’s you have a substantial risk of getting wrong results, unless your underlying data is reliable. Quantitative data is still often leveraged by the conventional ML algorithms and some organizations are championing this very well – some have been doing this for a few decades already!

In the current buzz it is easy to forget that one of the biggest challenges is that you cannot fully rely on generic generative AI models to answer factual questions correctly in a business context. Leading software companies, such as Microsoft, Salesforce and SAP, are currently pouring their resources into Business AI solutions designed to take your business to new heights. While AI assistants and automated workflows are useful tools, running a business successfully demands a thorough understanding of business logic and trust in underlying numbers. It is easy to forget that business AI needs data. So how to make your analytics data assets ready for business AI? Let’s find out!

More than ever the key question is the quality of the data. You do not want to have a Business AI solution that uses wrong data as a basis for the desired outcome.

The only way to build working business AI solutions is to enhance your models based on CORRECT business data. How to achieve that? Where to get that correct business data? Answer is simple – you need to start by taking care of the impeccable data flow in your data pipelines. Unless the correct data is available for the AI models you will be in trouble.

High-quality data is a daydream for anyone dealing with massive corporate business data solutions, often struggling with data integrity. An optimist might say that Business AI is pushing us to a new era where we will finally have the single version of the truth.

Here is my take on the top 10 activities that everyone should be doing today to make their data assets and organization ready for business AI:

  1. Get started: cultivate an AI mindset and understanding by training people and start to use available AI tools such as AI-assistants
  2. Assess and understand your current data and systems
  3. Set your ambition level and goals based on business strategy and targets
  4. Invest in skills: own and external
  5. Plan your roadmap and high-level data architecture based on your ambition level and possible use cases
  6. Ensure adequate data governance within your organization
  7. Select technologies that suit your overall IT systems landscape
  8. Design your detailed data architecture and solutions properly to avoid surprises
  9. Build a sustainable and modern data architecture to allow impeccable flow of data from source to your business AI solution
  10. Don’t forget: continuous housekeeping and incremental development based on your roadmap

As a business or IT leader you surely want to get started today to stay in the game and ensure your data architecture drives your organization’s future success. Make sure your data assets are ready for business AI solutions, and follow our step-by-step tips!

Etlia is a fast-growing and focused data engineering company specializing in business data. If you are interested in learning how to build your data pipelines business AI ready don’t hesitate to get in touch by booking a meeting with us.

Book a meeting or contact us!

Mikko Koljonen

The Power of appreciation

In today’s fast-paced work environment, it’s easy to get caught up in deadlines, targets, and the daily grind. But sometimes, amidst the hustle, we forget something crucial: appreciation.  

In the end people matter – hence one of our key values at Etlia is “We appreciate people”. Naturally this value encompasses all the essentials such as appreciating people irrespective of race, sex, religion, cultural background and age. But appreciation is much more than that: taking the time to acknowledge and celebrate the contributions of our colleagues is essential for building a positive, thriving workplace.

Why Appreciation Matters?

Appreciation isn’t just a feel-good nicety; it has a tangible impact on our work lives. Studies show that employees who feel valued are:

  • More engaged: When we feel our efforts are recognized, we’re more likely to go the extra mile and be invested in our work.  
  • More productive: Appreciation fosters a sense of purpose and motivation, leading to increased productivity.  
  • More collaborative: When appreciation is expressed, teams feel a sense of unity and are more likely to work together effectively.  
  • Less likely to leave: Feeling valued contributes to employee satisfaction and retention, reducing turnover.

Appreciation in Action at Etlia:

  • We appreciate people irrespective of race, sex, religion, neurodiversity, cultural background and age.  
  • We celebrate people. We celebrate successes and life milestones by rewarding employees with small gifts for their achievements and the joyful news in their lives. 
  • We recognize people’s contributions. Etlian’s contributions to Etlia or Customers are recognized on Etlia’s weekly meetings and appreciated in the communication channels. Also, they are rewarded according to the level of achievement.  
  • All Etlians helping with recruitment are rewarded. We encourage every employee to actively participate in shaping our team and culture. 
  • All Etlians getting certified in relevant technologies are recognized and rewarded in Etlia.

The Bottom Line

Taking the time to appreciate our colleagues isn’t just the right thing to do; it’s a smart business decision. By fostering a culture of appreciation, we create a more positive, productive, and successful workplace for everyone!  

At Etlia we are building the best community and platform for top experts’ professional growth.

Raaju Srinivasa Raghavan

Interested to join Etlia’s growing team of champions – get in touch and let’s meet for a coffee!

Etlia Data Engineering and Denodo launch a strategic alliance to boost next generation data management in the Nordic market

Etlia, a fast-growing Finnish data engineering company and Denodo, a recognized global leader in data management solutions announce a strategic alliance to jointly develop Denodo’s market presence in Finland and in other Nordic countries.

Denodo’s next generation Platform for data management embraces distributed data across on-premises, hybrid, and multi-cloud environments; it uses a logical/semantic-model approach to integrating and managing data; and it leverages artificial intelligence (AI) to simplify and automate manual tasks. The Denodo Platform provides one logical platform for all enterprise data, enhancing decision-making, driving operational efficiency, and facilitating swift responses to evolving business and market trends.

“Already one of the leading Denodo competence hubs in the region I am excited to announce next-level of strategic alliance with Denodo, a pioneering data integration, management and delivery platform. Our mission at Etlia Data Engineering is to help our customers create business value from data by leveraging major business process platforms and other data sources using best-of-breed data tools and platforms such as Denodo. We are known as experts in demanding analytics architectures and implementation roadmaps as well as a truly customer-oriented partner. Denodo platform brings our customers’ data to the foreground, boosting their digital transformation. Denodo being one of the spearheads of our portfolio I am excited to strengthen our cooperation to next level.” says Juuso Maijala CEO & Founder of Etlia Data Engineering.
“I am delighted to be able to announce Denodo’s strategic partnership with Etlia Data Engineering, renowned for their expertise in data-related skills and proficient knowledge in data management. Partnering with Etlia Data Engineering plays a pivotal role in ensuring the sustained success and widespread acceptance of Denodo within Finnish and the wider Nordic market.” says Charles Southwood, Regional VP for Denodo.

Additional information and inquiries:

Etlia Ltd, CEO & Founder, Juuso Maijala juuso.maijala@etlia.fi +358 50 532 0157

Denodo Ltd, Regional VP for Denodo, Charles Southwood

About Etlia Ltd

Etlia is a fast-growing Nordic data engineering company. We help our customers create business value from data by leveraging major business process platforms and external sources. Our services cover the full lifecycle of data solution from design to development, deployment and maintenance. We offer top experts the best platform and community to grow professionally. Our company was founded in 2013. We are based in Espoo, Finland. For more information, visit www.etlia.fi.

About Denodo

Denodo is a leader in data management. The award-winning Denodo Platform is the leading data integration, management, and delivery platform using a logical approach to enable self-service BI, data science, hybrid/multi-cloud data integration, and enterprise data services.

Realizing more than 400% ROI and millions of dollars in benefits, Denodo’s customers across large enterprises and mid-market companies in 30+ industries have received payback in less than 6 months. For more information, visit www.denodo.com.

A quick way to test SAP S/4HANA data extraction scenarios

It’s been a while since I published an SAP-related post: Fast access to SAP ERP demo data sources. Now it is time to look into some cool SAP S/4HANA stuff.

Let’s say you want to test or demonstrate utilizing SAP S/4HANA data with different data integration setups. How to go about it rapidly?

Well unlike with SAP ECC, we do not have an S/4HANA IDES environment like we used in our earlier post, but we can deploy an SAP S/4HANA Fully-Activated Appliance to our cloud of choice very quickly.

Testing and demonstrating with SAP S/4HANA

The SAP S/4HANA Fully-Activated Appliance luckily contains data designed to enable testing and demonstrating various analytical and operational scenarios, so it works well for us in e.g. testing data extraction from S/4HANA with SAP or 3rd party tools.

The appliance can be deployed from the SAP Cloud Appliance Library.

We’ll choose ‘Create Appliance’ for the latest appliance.

Next, we will give the details and authorization against our own Azure Subscription to enable CAL to deploy the resources.

We’ll go through the steps in the wizard and can drop components like SAP BO, which we do not need here, to save on costs. After deployment, we will set auto shutdown times for the VMs on Azure to keep costs down and will clean up the resources once not needed as they generate costs even when suspended.

Depending on the current Azure settings, the vCPU quotas may need to be increased to accommodate the robust requirements of the VMs.

After a while, we will see our resources deployed and running in our Azure Subscription and we can go and set things like static IPs and auto shutdown times so that we won’t generate unnecessary costs with the robust VMs S/4HANA requires.

For accessing the environment one can use the optional remote desktop VM or connect directly with things like SAP GUI, Fabric, AecorSoft etc.

Check SAP Community to get started

The SAP Community provides numerous demo scenarios supported with guides available. The CAL page for creating the appliance contains a getting started guide to get us going.

After digging up the access details we can access the environment and confirm via SAP GUI that we can see data.

We can now think of the next steps of possibly extracting SAP data with for example Fabric, AecorSoft or test SAP Datasphere Replication Flow to push data to our cloud storage of choice. This could be a topic for the next SAP post.

Do contact us with any questions about SAP and what are the best ways to extract and integrate S/4HANA data!

Janne Dalin

We have been an SAP partner since 2019. How could our SAP expertise benefit your business?

Contact us to explore the possibilities >>

AI in data engineering – hands-on experiences

Written by Shubham Keshri

As a fellow data engineer, I understand how tedious and time-consuming it can be to perform repetitive tasks. That’s why I’m excited to share some AI-based tips and tricks that can help you streamline your workflow and increase your productivity.

One tool that I highly recommend is Bing Chat GPT. It is an AI-powered chatbot which can help you with a wide range of tasks, from converting units to summarizing long articles. It’s like having a personal assistant at your fingertips!

Another tool that can help you save time is GitHub Copilot. This AI-powered tool is designed to help developers write code faster and more efficiently. It uses machine learning to suggest code snippets and auto-completes repetitive tasks, such as creating tables or copying files from one location to another.

Using AI with Azure Synapse Analytics

In one of the customer assignments, we used Azure Synapse Analytics to build some pipelines (we’re plumbers :D). However, as you may already know, Azure Synapse doesn’t allow you to write code directly on the IDE. Instead, you must use the portal.

You had to copy the code from a notebook and paste it into Bing AI. It’s like trying to play a game of chess with one hand tied behind your back! That’s why we use this method only for doing some migration. It is not perfect but sometimes gets the job done.

Copy-pasting wasn’t fun! But perhaps there was someone listening: with the recent update to GitHub Copilot with Visual Studio and Visual Studio code, you can now use the inbuilt chat feature to perform the same tasks without having to switch between different applications. This can save you a lot of time and make your workflow more efficient.

Using AI with Azure Synapse notebooks

Now let’s dive into some specific examples of how these tools can be used in conjunction with Azure Synapse notebooks.

If you’re working with Py Spark or Spark SQL in Synapse notebooks, you know how tedious it can be to write code for repetitive tasks like creating tables or copying files from one location to another. But with GitHub Copilot, you can easily auto-complete these tasks with just a few keystrokes.

For example, let’s say you want to create a new table in Synapse Analytics using PySpark. Normally, this would require several lines of code. But with GitHub Copilot, all you have to do is type “create table” followed by the name of your table and the data type for each column. GitHub Copilot will then generate the entire Py Spark code for you!

Similarly, suppose you want to copy data lake files from one location to another in Synapse Analytics using Spark SQL. In that case, all you have to do is type “copy data lake files” followed by the source and destination paths. GitHub Copilot will then generate the entire Spark SQL code for you!

These are just a few examples of how Bing Chat GPT and GitHub Copilot can be used with Azure Synapse notebooks to increase your productivity as a data engineer. By automating repetitive tasks and streamlining your workflow, you’ll be able to focus on what really matters: automating workflows, analyzing data and generating insights.

If you have any questions or comments reach out to us. And remember, always keep calm and code on!

P.S. Did you notice, that this blog post was written with the help of AI?

Contact us to learn more

Sharing is caring – and both are important

The data engineering technologies are constantly evolving and require solid project management skills. Nowadays you need to have a wide, updated knowledge and skills base, including both technical and soft skills.

We have noticed that knowledge sharing is a powerful tool contributing to our personnel development and customer success. In this blog, we will share Etlia’s practice of knowledge and experience sharing.

Efficient knowledge-sharing practices

We share our knowledge on a bi-weekly basis. Subjects are chosen together and there is always room for debate and discussion. Lately, we have been sharing our experiences on OpenAI, data extraction from SAP and features of Data Fabric. We are also going to have a demo of the data pipeline on Databricks with dbt Cloud and we are having a look at the latest and bravest Data Catalog offerings just to name a few in autumn 2023 sessions.

Furthermore, we keep an eye on upcoming online courses, vendor meetings, keynotes and happenings. If there are interesting topics an Etlian will attend and share the results and feedback in our knowledge-sharing sessions.

Experience sharing adds value to Etlians

Not only data pipeline development and technical knowledge are important. We share our experiences and practices about methodologies and agile ways of working as well. Lately, we have shared our thoughts on DevOps management and in December 2023 we are having a presentation and open discussion of our experiences in test automation best practices.

In summary, knowledge sharing is a vital part of our company. Systematic knowledge sharing supports our Career Radar program focusing on individual career development.

Read more about Career Radar program.

How we leverage career talks in Etlia

Are you tired of traditional development discussions? We have a better solution: our career talks really empower Etlians on their journey to success.

At Etlia, we recognize the power of both technology certifications and soft skills training in shaping individual career paths. In this blog post, we’ll delve into how our career talks focus on our personnel success. This is important for individual career development, our team spirit and our success in customer projects.

Aligning individual career paths

Every Etlian has a career path documented as “Etlia Career Radar”. External career path coaching and guidance are available with 100% confidentiality: what you decide to share during the coaching session remains between you and the coach – and you decide what is shared with the Etlia team. After the guidance, you have a career path defined in two levels: targets in the near future within one year and a scope of five to ten years.

Complementary skill sets

As technology evolves and AI solutions become more sophisticated and easy to use, the importance of soft skills in comparison to hard skills is increasing. Soft skills are the new hard skills! We have agreed that technical certifications and soft skills training are designed to work hand in hand. Soft skills, such as customer relationship management, agile project management, communication skills and human resource skills, play a pivotal role in every consultant’s work at Etlia.

Etlia is a people company. This is our way to ensure that together we have both complementary and uniform skill sets in order to meet the demands of the customer projects we are currently working on.

Strategic selection of tech certifications

In modern data warehousing, there are a lot of competing technologies and vendors. We have together chosen the technologies we pay the most interest in and allocate training. We do not lock into a specific vendor but keep our selection limited.

Consequently, we make an annual plan of certifications, keep track of them and fine-tune the tech certification needs. That’s called “Etlia Team Radar”. The first plan was created together on our trip to Barcelona in October 2023.

In summary, we at Etlia do not just count the number of certifications you have but take a broader view of your career and opportunities within.

Read how Etlia’s Career Radar program works in practice!

In the next blog we will share our practice of Knowledge Sharing, stay tuned!

Etlia Data Engineering announces completion of personnel share offering

Etlia Ltd

News release

28 April 2023 – 09:00 EET

Etlia Data Engineering has today closed it’s first personnel share offering. All Etlia’s employees participated in the offering with full subscription rights making all the employees also shareholders of the Company.

“Our personnel offering was 100% success! I am thrilled to see such engagement and interest into our share offering. I am proud that now all our employees are also Etlia’s shareholders. Our intention is to continue personnel offerings also in the coming years alongside our partner program which was launched this year.” says Juuso Maijala, CEO.

“It is fantastic to see the huge enthusiasm of Etlians and their commitment into company’s growth journey. Using ITA66a§ (Finnish: TVL66a§) framework provides an excellent way to engage personnel and I can recommend it to any company seeking to boost it’s growth through a share based incentive program.“ says Mikko Koljonen, Board Member.  

Additional information:

Juuso Maijala, CEO

juuso.maijala@etlia.fi

+358 50 532 0157

Mikko Koljonen, Board Member

mikko.koljonen@etlia.fi

+358 50 36 28 218

Etlia Ltd shortly:

Etlia is a data engineering company.

We help our customers create business value from data by leveraging major business process platforms and external sources. We offer top experts the best platform and community to grow professionally. Our company was founded in 2013. We are based in Espoo, Finland.

Synapse vs Databricks: A Comparison 

From Databricks to Synapse: A Data Architect’s Journey 

As a Data Platform Architect/ Engineer working with several clients in Finland, I have extensive experience using Azure Databricks and Azure Data Factory (for notebook orchestration). Recently, however, one of my clients made the decision to switch to Azure Synapse Analytics. In this post, I will share my journey of transitioning from Databricks to Synapse and provide insights that may help you make a more informed decision if you are considering either of these platforms. 

When it comes to choosing between Synapse and Databricks for your data processing needs, there are several factors to consider. Firstly, we will take a closer look at some of the key features of each platform and then finally my opinion on the matter.

Data Storage, Resource Access, and DevOps Integration 

When comparing Databricks and Synapse, it is important to consider the availability of certain features. For example, Databricks allows you to use multiple notebooks within the same session – a feature that is not currently available in Synapse. Another key difference between the two platforms is the way they handle data storage. Databricks provides a static mount path for your storage accounts, making it easy to navigate through your data like a traditional filesystem. In contrast, Synapse requires you to provide a ‘job id’ when reading data from a mount – an id that changes every time a new job is run. 

When it comes to accessing resources, Synapse offers linked service access management – a feature that allows for cleaner and more manageable connections between different services via Azure. In contrast, Databricks relies on tokens generated by service principals for resource access. However, Databricks does have an advantage when it comes to bootup time – boasting faster speeds than Synapse. On the other hand, Synapse has better DevOps integration compared to Databricks. 

Features, Performance and Use Cases 

There are several other key differences between Databricks and Synapse that are worth considering. For example, Databricks currently offers more features and better performance optimizations than Synapse. However, for data platforms that primarily use SQL and have few Spark use cases, Synapse Analytics may be the better choice. Synapse has an open-source version of Spark with built-in support for .NET applications, while Databricks has an optimized version of Spark that offers increased performance. Additionally, Databricks allows users to select GPU-enabled clusters for faster data processing and higher concurrency. 

User Experience 

In terms of user experience, Synapse has a traditional SQL engine that may feel more familiar to BI developers. It also has a Spark engine for use by data scientists and analysts. In contrast, Databricks is a Spark-based notebook tool with a focus on Spark functionality. Synapse currently only offers hive metadata GUI but with Unity Catalog, Databricks takes it to another level of creating the metadata hierarchy. 

Managing Workflows with External Orchestration Tools 

One important aspect to understand when using notebooks in Databricks is the lack of an in-built orchestration tool or service. While it is possible to schedule jobs in Databricks, the functionality is quite basic. For this reason, in many projects we used Azure Data Factory to orchestrate Databricks notebooks. In a recent Databricks meetup, one participant mentioned using Apache Airflow for orchestration on AWS – though I am not sure about GCP. This is a crucial point to consider because Synapse bundles everything under one umbrella for seamless integration. Until Databricks produces an alternative solution, you will need to use it alongside ADF (Azure Data Factory) or Synapse for orchestration.  

Feature Databricks Azure Synapse Analytics 
Multiple notebooks within same session Yes No 
Data storage handling Static mount path for storage accounts Requires ‘job id’ when reading data from a mount 
Resource access management Tokens generated by service principals Linked Service access management 
Bootup time Faster speeds than Synapse Slower speeds than Databricks 
DevOps integration Less integration compared to Synapse Better integration compared to Databricks 
Features and performance optimizations More features and better performance optimizations than Synapse Fewer features and less performance optimizations than Databricks 
SQL support Less support for SQL use cases Better support for SQL use cases 
Spark engine Optimized version of Spark that offers increased performance Open-source version of Spark with built-in support for .NET applications 
GPU-enabled clusters Allows users to select GPU-enabled clusters for faster data processing and higher concurrency Not available in Synapse now. 
User experience Spark-based notebook tool with a focus on Spark functionality Traditional SQL engine that may feel more familiar to BI developers. Also has a Spark engine for use by data scientists and analysts.  
Real-time Co-Authoring Databricks Notebooks has as real-time co-authoring (both authors see the changes in real-time) Synapse Notebooks has co-authoring of Notebooks, but one person needs to save the Notebook before another person sees the change 
Orchestration tool or service Lacks an in-built orchestration tool or service. Needs to be used alongside ADF or Synapse for orchestration. Bundles everything under one umbrella for seamless integration. 
Synapse vs Databricks feature comparison summary table. 

Choosing Between Databricks and Synapse: Which One Is Right for You? 

Ultimately, the choice between these two platforms will depend on your specific needs and priorities. Nah! I will not leave you with a diplomatic answer. In my opinion (could be controversial based on your cloud bias and when are you reading this) if your infra is on AWS/GCP, your priority is data processing efficiency and access to latest spark and delta features go for Databricks. 

On the other hand, if your infrastructure is primarily based on Azure and your use case involves data preparation for a data platform with data modeling on a Datalake (reach out if you are interested to know how), then Azure Synapse may be the better choice. Synapse has more features in development for future releases – something that has not been announced by Databricks yet. Good luck! And stay tuned for upcoming series focusing on ML, streaming, delta and partitioning. 

.