April 2023 - Etlia

Etlia Ltd

News release

28 April 2023 – 09:00 EET

Etlia Data Engineering has today closed it’s first personnel share offering. All Etlia’s employees participated in the offering with full subscription rights making all the employees also shareholders of the Company.

“Our personnel offering was 100% success! I am thrilled to see such engagement and interest into our share offering. I am proud that now all our employees are also Etlia’s shareholders. Our intention is to continue personnel offerings also in the coming years alongside our partner program which was launched this year.” says Juuso Maijala, CEO.

“It is fantastic to see the huge enthusiasm of Etlians and their commitment into company’s growth journey. Using ITA66a§ (Finnish: TVL66a§) framework provides an excellent way to engage personnel and I can recommend it to any company seeking to boost it’s growth through a share based incentive program.“ says Mikko Koljonen, Board Member.

Additional information:

Juuso Maijala, CEO

juuso.maijala@etlia.fi

+358 50 532 0157

Mikko Koljonen, Board Member

mikko.koljonen@etlia.fi

+358 50 36 28 218

Etlia Ltd shortly:

Etlia is a data engineering company.

We help our customers create business value from data by leveraging major business process platforms and external sources. We offer top experts the best platform and community to grow professionally. Our company was founded in 2013. We are based in Espoo, Finland.

From Databricks to Synapse: A Data Architect’s Journey

As a Data Platform Architect/ Engineer working with several clients in Finland, I have extensive experience using Azure Databricks and Azure Data Factory (for notebook orchestration). Recently, however, one of my clients made the decision to switch to Azure Synapse Analytics. In this post, I will share my journey of transitioning from Databricks to Synapse and provide insights that may help you make a more informed decision if you are considering either of these platforms.

When it comes to choosing between Synapse and Databricks for your data processing needs, there are several factors to consider. Firstly, we will take a closer look at some of the key features of each platform and then finally my opinion on the matter.

Data Storage, Resource Access, and DevOps Integration

When comparing Databricks and Synapse, it is important to consider the availability of certain features. For example, Databricks allows you to use multiple notebooks within the same session – a feature that is not currently available in Synapse. Another key difference between the two platforms is the way they handle data storage. Databricks provides a static mount path for your storage accounts, making it easy to navigate through your data like a traditional filesystem. In contrast, Synapse requires you to provide a ‘job id’ when reading data from a mount – an id that changes every time a new job is run.

When it comes to accessing resources, Synapse offers linked service access management – a feature that allows for cleaner and more manageable connections between different services via Azure. In contrast, Databricks relies on tokens generated by service principals for resource access. However, Databricks does have an advantage when it comes to bootup time – boasting faster speeds than Synapse. On the other hand, Synapse has better DevOps integration compared to Databricks.

Features, Performance and Use Cases

There are several other key differences between Databricks and Synapse that are worth considering. For example, Databricks currently offers more features and better performance optimizations than Synapse. However, for data platforms that primarily use SQL and have few Spark use cases, Synapse Analytics may be the better choice. Synapse has an open-source version of Spark with built-in support for .NET applications, while Databricks has an optimized version of Spark that offers increased performance. Additionally, Databricks allows users to select GPU-enabled clusters for faster data processing and higher concurrency.

User Experience

In terms of user experience, Synapse has a traditional SQL engine that may feel more familiar to BI developers. It also has a Spark engine for use by data scientists and analysts. In contrast, Databricks is a Spark-based notebook tool with a focus on Spark functionality. Synapse currently only offers hive metadata GUI but with Unity Catalog, Databricks takes it to another level of creating the metadata hierarchy.

Managing Workflows with External Orchestration Tools

One important aspect to understand when using notebooks in Databricks is the lack of an in-built orchestration tool or service. While it is possible to schedule jobs in Databricks, the functionality is quite basic. For this reason, in many projects we used Azure Data Factory to orchestrate Databricks notebooks. In a recent Databricks meetup, one participant mentioned using Apache Airflow for orchestration on AWS – though I am not sure about GCP. This is a crucial point to consider because Synapse bundles everything under one umbrella for seamless integration. Until Databricks produces an alternative solution, you will need to use it alongside ADF (Azure Data Factory) or Synapse for orchestration.

Feature	Databricks	Azure Synapse Analytics
Multiple notebooks within same session	Yes	No
Data storage handling	Static mount path for storage accounts	Requires ‘job id’ when reading data from a mount
Resource access management	Tokens generated by service principals	Linked Service access management
Bootup time	Faster speeds than Synapse	Slower speeds than Databricks
DevOps integration	Less integration compared to Synapse	Better integration compared to Databricks
Features and performance optimizations	More features and better performance optimizations than Synapse	Fewer features and less performance optimizations than Databricks
SQL support	Less support for SQL use cases	Better support for SQL use cases
Spark engine	Optimized version of Spark that offers increased performance	Open-source version of Spark with built-in support for .NET applications
GPU-enabled clusters	Allows users to select GPU-enabled clusters for faster data processing and higher concurrency	Not available in Synapse now.
User experience	Spark-based notebook tool with a focus on Spark functionality	Traditional SQL engine that may feel more familiar to BI developers. Also has a Spark engine for use by data scientists and analysts.
Real-time Co-Authoring	Databricks Notebooks has as real-time co-authoring (both authors see the changes in real-time)	Synapse Notebooks has co-authoring of Notebooks, but one person needs to save the Notebook before another person sees the change
Orchestration tool or service	Lacks an in-built orchestration tool or service. Needs to be used alongside ADF or Synapse for orchestration.	Bundles everything under one umbrella for seamless integration.

Synapse vs Databricks feature comparison summary table.

Choosing Between Databricks and Synapse: Which One Is Right for You?

Ultimately, the choice between these two platforms will depend on your specific needs and priorities. Nah! I will not leave you with a diplomatic answer. In my opinion (could be controversial based on your cloud bias and when are you reading this) if your infra is on AWS/GCP, your priority is data processing efficiency and access to latest spark and delta features go for Databricks.

On the other hand, if your infrastructure is primarily based on Azure and your use case involves data preparation for a data platform with data modeling on a Datalake (reach out if you are interested to know how), then Azure Synapse may be the better choice. Synapse has more features in development for future releases – something that has not been announced by Databricks yet. Good luck! And stay tuned for upcoming series focusing on ML, streaming, delta and partitioning.

Month: April 2023

Etlia Data Engineering announces completion of personnel share offering