AI in data engineering – hands-on experiences

Written by Shubham Keshri

As a fellow data engineer, I understand how tedious and time-consuming it can be to perform repetitive tasks. That’s why I’m excited to share some AI-based tips and tricks that can help you streamline your workflow and increase your productivity.

One tool that I highly recommend is Bing Chat GPT. It is an AI-powered chatbot which can help you with a wide range of tasks, from converting units to summarizing long articles. It’s like having a personal assistant at your fingertips!

Another tool that can help you save time is GitHub Copilot. This AI-powered tool is designed to help developers write code faster and more efficiently. It uses machine learning to suggest code snippets and auto-completes repetitive tasks, such as creating tables or copying files from one location to another.

Using AI with Azure Synapse Analytics

In one of the customer assignments, we used Azure Synapse Analytics to build some pipelines (we’re plumbers :D). However, as you may already know, Azure Synapse doesn’t allow you to write code directly on the IDE. Instead, you must use the portal.

You had to copy the code from a notebook and paste it into Bing AI. It’s like trying to play a game of chess with one hand tied behind your back! That’s why we use this method only for doing some migration. It is not perfect but sometimes gets the job done.

Copy-pasting wasn’t fun! But perhaps there was someone listening: with the recent update to GitHub Copilot with Visual Studio and Visual Studio code, you can now use the inbuilt chat feature to perform the same tasks without having to switch between different applications. This can save you a lot of time and make your workflow more efficient.

Using AI with Azure Synapse notebooks

Now let’s dive into some specific examples of how these tools can be used in conjunction with Azure Synapse notebooks.

If you’re working with Py Spark or Spark SQL in Synapse notebooks, you know how tedious it can be to write code for repetitive tasks like creating tables or copying files from one location to another. But with GitHub Copilot, you can easily auto-complete these tasks with just a few keystrokes.

For example, let’s say you want to create a new table in Synapse Analytics using PySpark. Normally, this would require several lines of code. But with GitHub Copilot, all you have to do is type “create table” followed by the name of your table and the data type for each column. GitHub Copilot will then generate the entire Py Spark code for you!

Similarly, suppose you want to copy data lake files from one location to another in Synapse Analytics using Spark SQL. In that case, all you have to do is type “copy data lake files” followed by the source and destination paths. GitHub Copilot will then generate the entire Spark SQL code for you!

These are just a few examples of how Bing Chat GPT and GitHub Copilot can be used with Azure Synapse notebooks to increase your productivity as a data engineer. By automating repetitive tasks and streamlining your workflow, you’ll be able to focus on what really matters: automating workflows, analyzing data and generating insights.

If you have any questions or comments reach out to us. And remember, always keep calm and code on!

P.S. Did you notice, that this blog post was written with the help of AI?

Contact us to learn more

Sharing is caring – and both are important

The data engineering technologies are constantly evolving and require solid project management skills. Nowadays you need to have a wide, updated knowledge and skills base, including both technical and soft skills.

We have noticed that knowledge sharing is a powerful tool contributing to our personnel development and customer success. In this blog, we will share Etlia’s practice of knowledge and experience sharing.

Efficient knowledge-sharing practices

We share our knowledge on a bi-weekly basis. Subjects are chosen together and there is always room for debate and discussion. Lately, we have been sharing our experiences on OpenAI, data extraction from SAP and features of Data Fabric. We are also going to have a demo of the data pipeline on Databricks with dbt Cloud and we are having a look at the latest and bravest Data Catalog offerings just to name a few in autumn 2023 sessions.

Furthermore, we keep an eye on upcoming online courses, vendor meetings, keynotes and happenings. If there are interesting topics an Etlian will attend and share the results and feedback in our knowledge-sharing sessions.

Experience sharing adds value to Etlians

Not only data pipeline development and technical knowledge are important. We share our experiences and practices about methodologies and agile ways of working as well. Lately, we have shared our thoughts on DevOps management and in December 2023 we are having a presentation and open discussion of our experiences in test automation best practices.

In summary, knowledge sharing is a vital part of our company. Systematic knowledge sharing supports our Career Radar program focusing on individual career development.

Read more about Career Radar program.

How we leverage career talks in Etlia

Are you tired of traditional development discussions? We have a better solution: our career talks really empower Etlians on their journey to success.

At Etlia, we recognize the power of both technology certifications and soft skills training in shaping individual career paths. In this blog post, we’ll delve into how our career talks focus on our personnel success. This is important for individual career development, our team spirit and our success in customer projects.

Aligning individual career paths

Every Etlian has a career path documented as “Etlia Career Radar”. External career path coaching and guidance are available with 100% confidentiality: what you decide to share during the coaching session remains between you and the coach – and you decide what is shared with the Etlia team. After the guidance, you have a career path defined in two levels: targets in the near future within one year and a scope of five to ten years.

Complementary skill sets

As technology evolves and AI solutions become more sophisticated and easy to use, the importance of soft skills in comparison to hard skills is increasing. Soft skills are the new hard skills! We have agreed that technical certifications and soft skills training are designed to work hand in hand. Soft skills, such as customer relationship management, agile project management, communication skills and human resource skills, play a pivotal role in every consultant’s work at Etlia.

Etlia is a people company. This is our way to ensure that together we have both complementary and uniform skill sets in order to meet the demands of the customer projects we are currently working on.

Strategic selection of tech certifications

In modern data warehousing, there are a lot of competing technologies and vendors. We have together chosen the technologies we pay the most interest in and allocate training. We do not lock into a specific vendor but keep our selection limited.

Consequently, we make an annual plan of certifications, keep track of them and fine-tune the tech certification needs. That’s called “Etlia Team Radar”. The first plan was created together on our trip to Barcelona in October 2023.

In summary, we at Etlia do not just count the number of certifications you have but take a broader view of your career and opportunities within.

Read how Etlia’s Career Radar program works in practice!

In the next blog we will share our practice of Knowledge Sharing, stay tuned!

.