Database and Datascience: juillet 2023

ETL moved to ELT, What are similarities and differences?

Author: Donatien MBADI OUM, Consultant and Advisor Oracle | AWS | Azure

1. ETL and ELT Overview

The ETL (Extract, Transform and Load) and ELT (Extract, Load and Transform) acronyms both describe processes of cleaning, enriching and transforming data from a variety of sources before integrating it for use in data analytics, business intelligence and data science. Both capture, process and load data for analysis across three steps:

- Extraction

Extraction is the first step of both ETL and ELT. This step is about collecting raw data from different sources. These could be database, files, software as a service (SaaS), Internet of things (IoT) sensors, or application events. You can collect structured, semi-structured or unstructured data at this stage.

- Transformation

In the ETL, Transformation is the second step, while in ELT it’s the third and final step. This step focuses on changing raw data from its original structure into a format that meets the requirement of the target system where you plan to store the data analytics.

- Load

In the ELT, Load is the second step, while in ETL it’s the third and final step. In this phase on ETL, you store data into the target database, so that reporting tools can use it directly to generate actionable reports and insights. In ELT, you still need to transform the extracted data after loading it.

For years, ETL was a perfect process to operate, given the amount of data involved and tools available for working with it, and it still works well for some use cases today. Because ETL transforms data prior to the loading stage, it’s the ideal process when a destination requires a specific data format.

But as the volume and speed of data have multiplied, as the pace of business has increased, and as new integration tools have arrived on the market, ELT has gained in popularity. When the destination is a cloud-native data warehouse like Amazon Redshift, Google BigQuery, Snowflake or Microsoft SQL Data Warehouse, ELT is the better approach.

Your decision between ETL and ELT will determine your data storage, analysis and processing. So before choosing between the two methods, it’s important to consider all factors, this includes the type of business you are running and your data needs.

2. ETL…

ETL is a data integration methodology that extracts a raw data from sources, transforms the data on a secondary processing server, and then loads the data into a target database for analytics. The transformation stage ensures compliance with the target database’s structural requirements. You only move the data once it is transformed and ready.

The key benefits of ETL are:

o Maturity: ETL is a well-supported and well-understood function

o Faster analysis on a single, predefined use case: The dataset has already been structured and transformed, analysis can happen immediately.

o Transparency: in some cases, understanding database lineage is more straightforward because data transformation is performed inline, as opposed to at a later date.

The key insufficiencies are:

o Time-consuming: The process of transformation is driven from a set of requirements that can take weeks or months to map to. After transformation, loading the data can be almost as time-consuming.

o Rigid: ETL doesn’t support ad-hoc workflows. All the parameters have to be mapped out and then implemented. Whenever a new cases arises, the entire ETL process has to be repeated.

o Opaque and brittle: the transformation process is not transparent, since only the people transforming the data access to it. Since there are often precedent dependencies, altering any step in an ETL workflow can break other workflows.

o Higher cost: ETL workflows require an investment in op-premise hardware.

3. ELT…

Today, Organizations have to process a huge amount of data and if they want to complete in the real-time economy, they have to do it as quickly as possible. That is exactly the kind of scenario that traditional ETL can’t handle, and why the market is shifting toward ELT.

Unlike ETL, ELT does not require databa transformation to take place before the loading process. ELT lods raw data directly into a target data warehouse, instead of moving it to a processing server for transformation.

The key benefits of ELT are:

o Speed, Speed and more speed: The ELT process is typically involves loading all your data into a cloud repository and today’s cloud services allow you to quickly transform and process massive data sets.

o Flexibility: The ELT process enables you to replicate all your raw data on demand, as it’s needed for data analytics, BI or others systems.

o Scalability: Modern cloud warehouses, data lakes and data lakehouses give you the scale to leverage all your raw data.

o Transparency: Data consumers can see what data is available almost as soon as it comes in the door.

o Lower cost: Cloud-based ELT usually offer a lower total cost, given that you pay only what you use.

4. Key difference between ETL and ELT

Category	ETL	ELT
Definition	Data is extracted from a source system, transformed on a secondary processing server and loaded into a destination system	Data is estratec from a source system, load into a destination system and transformed inside the destination system
Support for Data Warehouse	ETL is the traditional process for transforming and integrating structured or relational data into on-premise or cloud-based data warehouse	ELT is the modern process for transforming structured or unstrutured data into a cloud-based data ware house
Support for Data Lake/Mart/Lakehouse	ETL is not an appropriate process for data lakes, data marts or data lakehouses	ELT process is tailored to provide a data pipe line for data lakes, data marts or data lakehouses
Extraction	Raw data is extrated using API connectors	Raw data is extrated using API connectors
Transformation	Date transformation is performed in a staging area outside of the data warehouse and the entire data must be transformes before loading. As a result, transforming larger data sets can take long time up front but analysis can take place immediately once the ETL process is complete	Data transformation is performed on an as-needed basis in the target system itself. As a result, the transformation step takes little time but can slow down the querying and analysis processes if there is not sufficient processing power
Loading	In ETL, the loading step requires data to be loaded into a staging area befoes beaing loaded into the target system. This multi-step process takes longer than the ELT process	In ELT, the full data set is loaded directly into the target system. Since there is only one step, and it only happens one time, loading in the ELT process is faster than ETL
Size/size of data set	ETL is most approproate for processing smaller, relational fata sets which require complex transformation and have been predetermined as beaing relevant to the analysis goals	ELT can handle any size or type of data and is well suited for processing both strutured ans unstructures big data. Since the entire data sets is loaded, analysts can choose at any time which data to transform and use fos analysis
Implementation	The ETL process has been around for decades and there is a mature ecosystem of ERL tools and experts readly available to help with implementation	The ELT process is a new apprach and the ecosystem of tools and experts needed to implement it is still growing
Speed	ETL is a time-intensice process; data is transformed before loading into a destination system	ELT is faster by comparison; data is load directly into a desination system, and trandformed in-parallel
Cost-Based Transformations	Separate servers can create cost issues. Best for compute intensive transformations and pre-cleaning. ETL can be cost-prohibitive for many small and medium business	Transformation performed in-database, simultameous load and transform. ETL benefits from a robust recosystem of cloud-based platforms which offer much lower costs and a variety of plan options to store and process data
Hardware	The traditional, on-premises ETL process requires expensive hardware. Newer, cloud-based ETL solutions do not require hardware	Given that the ELT process is inherently cloud-based, no additional hardware is required
Maintenance	Secondary processing server adds to the maintenance burden	With fewer systems, the maintenance burden is reduced
Data Output	Strquctured (typically)	Strutured, semi-strutured, unstrutured
Data Volume	Ideal for small data sets with complicated transformation requirements	Idela for large data sets that require spedd and efficiency

Database and Datascience

dimanche 2 juillet 2023

ETL moved to ELT in the Cloud, What are similarities and differences

How to Provision Autonomous Database Service in Oracle Database@Google Cloud

oracle

oracle