Etlia Data Engineering has today closed it’s first personnel share offering. All Etlia’s employees participated in the offering with full subscription rights making all the employees also shareholders of the Company.
“Our personnel offering was 100% success! I am thrilled to see such engagement and interest into our share offering. I am proud that now all our employees are also Etlia’s shareholders. Our intention is to continue personnel offerings also in the coming years alongside our partner program which was launched this year.” says Juuso Maijala, CEO.
“It is fantastic to see the huge enthusiasm of Etlians and their commitment into company’s growth journey. Using ITA66a§ (Finnish: TVL66a§) framework provides an excellent way to engage personnel and I can recommend it to any company seeking to boost it’s growth through a share based incentive program.“ says Mikko Koljonen, Board Member.
We help our customers create business value from data by leveraging major business process platforms and external sources. We offer top experts the best platform and community to grow professionally. Our company was founded in 2013. We are based in Espoo, Finland.
From Databricks to Synapse: A Data Architect’s Journey
As a Data Platform Architect/ Engineer working with several clients in Finland, I have extensive experience using Azure Databricks and Azure Data Factory (for notebook orchestration). Recently, however, one of my clients made the decision to switch to Azure Synapse Analytics. In this post, I will share my journey of transitioning from Databricks to Synapse and provide insights that may help you make a more informed decision if you are considering either of these platforms.
When it comes to choosing between Synapse and Databricks for your data processing needs, there are several factors to consider. Firstly, we will take a closer look at some of the key features of each platform and then finally my opinion on the matter.
Data Storage, Resource Access, and DevOps Integration
When comparing Databricks and Synapse, it is important to consider the availability of certain features. For example, Databricks allows you to use multiple notebooks within the same session – a feature that is not currently available in Synapse. Another key difference between the two platforms is the way they handle data storage. Databricks provides a static mount path for your storage accounts, making it easy to navigate through your data like a traditional filesystem. In contrast, Synapse requires you to provide a ‘job id’ when reading data from a mount – an id that changes every time a new job is run.
When it comes to accessing resources, Synapse offers linked service access management – a feature that allows for cleaner and more manageable connections between different services via Azure. In contrast, Databricks relies on tokens generated by service principals for resource access. However, Databricks does have an advantage when it comes to bootup time – boasting faster speeds than Synapse. On the other hand, Synapse has better DevOps integration compared to Databricks.
Features, Performance and Use Cases
There are several other key differences between Databricks and Synapse that are worth considering. For example, Databricks currently offers more features and better performance optimizations than Synapse. However, for data platforms that primarily use SQL and have few Spark use cases, Synapse Analytics may be the better choice. Synapse has an open-source version of Spark with built-in support for .NET applications, while Databricks has an optimized version of Spark that offers increased performance. Additionally, Databricks allows users to select GPU-enabled clusters for faster data processing and higher concurrency.
User Experience
In terms of user experience, Synapse has a traditional SQL engine that may feel more familiar to BI developers. It also has a Spark engine for use by data scientists and analysts. In contrast, Databricks is a Spark-based notebook tool with a focus on Spark functionality. Synapse currently only offers hive metadata GUI but with Unity Catalog, Databricks takes it to another level of creating the metadata hierarchy.
Managing Workflows with External Orchestration Tools
One important aspect to understand when using notebooks in Databricks is the lack of an in-built orchestration tool or service. While it is possible to schedule jobs in Databricks, the functionality is quite basic. For this reason, in many projects we used Azure Data Factory to orchestrate Databricks notebooks. In a recent Databricks meetup, one participant mentioned using Apache Airflow for orchestration on AWS – though I am not sure about GCP. This is a crucial point to consider because Synapse bundles everything under one umbrella for seamless integration. Until Databricks produces an alternative solution, you will need to use it alongside ADF (Azure Data Factory) or Synapse for orchestration.
Feature
Databricks
Azure Synapse Analytics
Multiple notebooks within same session
Yes
No
Data storage handling
Static mount path for storage accounts
Requires ‘job id’ when reading data from a mount
Resource access management
Tokens generated by service principals
Linked Service access management
Bootup time
Faster speeds than Synapse
Slower speeds than Databricks
DevOps integration
Less integration compared to Synapse
Better integration compared to Databricks
Features and performance optimizations
More features and better performance optimizations than Synapse
Fewer features and less performance optimizations than Databricks
SQL support
Less support for SQL use cases
Better support for SQL use cases
Spark engine
Optimized version of Spark that offers increased performance
Open-source version of Spark with built-in support for .NET applications
GPU-enabled clusters
Allows users to select GPU-enabled clusters for faster data processing and higher concurrency
Not available in Synapse now.
User experience
Spark-based notebook tool with a focus on Spark functionality
Traditional SQL engine that may feel more familiar to BI developers. Also has a Spark engine for use by data scientists and analysts.
Real-time Co-Authoring
Databricks Notebooks has as real-time co-authoring (both authors see the changes in real-time)
Synapse Notebooks has co-authoring of Notebooks, but one person needs to save the Notebook before another person sees the change
Orchestration tool or service
Lacks an in-built orchestration tool or service. Needs to be used alongside ADF or Synapse for orchestration.
Bundles everything under one umbrella for seamless integration.
Synapse vs Databricks feature comparison summary table.
Choosing Between Databricks and Synapse: Which One Is Right for You?
Ultimately, the choice between these two platforms will depend on your specific needs and priorities. Nah! I will not leave you with a diplomatic answer. In my opinion (could be controversial based on your cloud bias and when are you reading this) if your infra is on AWS/GCP, your priority is data processing efficiency and access to latest spark and delta features go for Databricks.
On the other hand, if your infrastructure is primarily based on Azure and your use case involves data preparation for a data platform with data modeling on a Datalake (reach out if you are interested to know how), then Azure Synapse may be the better choice. Synapse has more features in development for future releases – something that has not been announced by Databricks yet. Good luck! And stay tuned for upcoming series focusing on ML, streaming, delta and partitioning.
Hallinnoi evästeasetuksiasi
Käytämme evästeitä jotta sinä saisit parempaa sisältöä ja palvelua.
Tarpeelliset evästeet
Always active
Tekninen tallennus tai pääsy on ehdottoman välttämätön oikeutettua tarkoitusta varten, joka mahdollistaa tietyn tilaajan tai käyttäjän nimenomaisesti pyytämän palvelun käytön, tai yksinomaan viestinnän välittämiseksi sähköisen viestintäverkon kautta.
Asetukset
Tekninen tallennus tai pääsy on tarpeen laillisessa tarkoituksessa sellaisten asetusten tallentamiseen, joita tilaaja tai käyttäjä ei ole pyytänyt.
Tilastot
Tekninen tallennus tai pääsy, jota käytetään yksinomaan anonyymeihin tilastollisiin tarkoituksiin. Ilman haastetta, Internet-palveluntarjoajasi vapaaehtoista suostumusta tai kolmannen osapuolen lisätietueita pelkästään tähän tarkoitukseen tallennettuja tai haettuja tietoja ei yleensä voida käyttää tunnistamaan sinua.Tekninen tallennus tai pääsy, jota käytetään yksinomaan anonyymeihin tilastollisiin tarkoituksiin. Ilman haastetta, Internet-palveluntarjoajasi vapaaehtoista suostumusta tai kolmannen osapuolen lisätietueita pelkästään tähän tarkoitukseen tallennettuja tai haettuja tietoja ei yleensä voida käyttää tunnistamaan sinua.
Markkinointi
Teknistä tallennustilaa tai pääsyä tarvitaan käyttäjäprofiilien luomiseen mainosten lähettämistä varten tai käyttäjän seuraamiseksi verkkosivustolla tai useilla verkkosivustoilla vastaavia markkinointitarkoituksia varten.