AI in data engineering – hands-on experiences

AI in data engineering – hands-on experiences

Written by Shubham Keshri

As a fellow data engineer, I understand how tedious and time-consuming it can be to perform repetitive tasks. That’s why I’m excited to share some AI-based tips and tricks that can help you streamline your workflow and increase your productivity.

One tool that I highly recommend is Bing Chat GPT. It is an AI-powered chatbot which can help you with a wide range of tasks, from converting units to summarizing long articles. It’s like having a personal assistant at your fingertips!

Another tool that can help you save time is GitHub Copilot. This AI-powered tool is designed to help developers write code faster and more efficiently. It uses machine learning to suggest code snippets and auto-completes repetitive tasks, such as creating tables or copying files from one location to another.

Using AI with Azure Synapse Analytics

In one of the customer assignments, we used Azure Synapse Analytics to build some pipelines (we’re plumbers :D). However, as you may already know, Azure Synapse doesn’t allow you to write code directly on the IDE. Instead, you must use the portal.

You had to copy the code from a notebook and paste it into Bing AI. It’s like trying to play a game of chess with one hand tied behind your back! That’s why we use this method only for doing some migration. It is not perfect but sometimes gets the job done.

Copy-pasting wasn’t fun! But perhaps there was someone listening: with the recent update to GitHub Copilot with Visual Studio and Visual Studio code, you can now use the inbuilt chat feature to perform the same tasks without having to switch between different applications. This can save you a lot of time and make your workflow more efficient.

Using AI with Azure Synapse notebooks

Now let’s dive into some specific examples of how these tools can be used in conjunction with Azure Synapse notebooks.

If you’re working with Py Spark or Spark SQL in Synapse notebooks, you know how tedious it can be to write code for repetitive tasks like creating tables or copying files from one location to another. But with GitHub Copilot, you can easily auto-complete these tasks with just a few keystrokes.

For example, let’s say you want to create a new table in Synapse Analytics using PySpark. Normally, this would require several lines of code. But with GitHub Copilot, all you have to do is type “create table” followed by the name of your table and the data type for each column. GitHub Copilot will then generate the entire Py Spark code for you!

Similarly, suppose you want to copy data lake files from one location to another in Synapse Analytics using Spark SQL. In that case, all you have to do is type “copy data lake files” followed by the source and destination paths. GitHub Copilot will then generate the entire Spark SQL code for you!

These are just a few examples of how Bing Chat GPT and GitHub Copilot can be used with Azure Synapse notebooks to increase your productivity as a data engineer. By automating repetitive tasks and streamlining your workflow, you’ll be able to focus on what really matters: automating workflows, analyzing data and generating insights.

If you have any questions or comments reach out to us. And remember, always keep calm and code on!

P.S. Did you notice, that this blog post was written with the help of AI?

Contact us to learn more