ETL moved to
ELT, What are similarities and differences?
Author: Donatien MBADI OUM, Consultant
and Advisor Oracle | AWS | Azure
1.
ETL and ELT Overview
The ETL (Extract, Transform and Load) and ELT (Extract, Load and Transform)
acronyms both describe processes of cleaning, enriching and transforming data
from a variety of sources before integrating it for use in data analytics,
business intelligence and data science. Both capture, process and load data for
analysis across three steps:
-
Extraction
Extraction is the first step of both ETL and ELT. This step is
about collecting raw data from different sources. These could be database,
files, software as a service (SaaS), Internet of things (IoT) sensors, or
application events. You can collect structured, semi-structured or unstructured
data at this stage.
-
Transformation
In the ETL, Transformation is the second step, while in ELT
it’s the third and final step. This step focuses on changing raw data from its
original structure into a format that meets the requirement of the target
system where you plan to store the data analytics.
-
Load
In the ELT, Load is the second step, while in ETL it’s the third and final
step. In this phase on ETL, you store data into the target database, so that
reporting tools can use it directly to generate actionable reports and
insights. In ELT, you still need to transform the extracted data after loading
it.
For years, ETL was a perfect process to operate, given the amount of data
involved and tools available for working with it, and it still works well for
some use cases today. Because ETL transforms data prior to the loading stage,
it’s the ideal process when a destination requires a specific data format.
But as the volume and speed of data have multiplied, as the pace of
business has increased, and as new integration tools have arrived on the
market, ELT has gained in popularity. When the destination is a cloud-native
data warehouse like Amazon Redshift, Google BigQuery, Snowflake or Microsoft
SQL Data Warehouse, ELT is the better approach.
Your decision between ETL and ELT will determine your data storage,
analysis and processing. So before choosing between the two methods, it’s
important to consider all factors, this includes the type of business you are
running and your data needs.
2.
ETL…
ETL is a data integration methodology that extracts a raw data from
sources, transforms the data on a secondary processing server, and then loads
the data into a target database for analytics. The transformation stage ensures
compliance with the target database’s structural requirements. You only move
the data once it is transformed and ready.
The key benefits of ETL are:
o Maturity: ETL is a well-supported and well-understood function
o Faster analysis on a single,
predefined use case: The dataset has already been
structured and transformed, analysis can happen immediately.
o Transparency: in some cases, understanding database lineage is more
straightforward because data transformation is performed inline, as opposed to
at a later date.
The key insufficiencies are:
o Time-consuming: The process of transformation is driven from a set of
requirements that can take weeks or months to map to. After transformation,
loading the data can be almost as time-consuming.
o Rigid: ETL doesn’t support ad-hoc workflows. All the parameters have to be
mapped out and then implemented. Whenever a new cases arises, the entire ETL
process has to be repeated.
o Opaque and brittle: the transformation process is not transparent, since only
the people transforming the data access to it. Since there are often precedent
dependencies, altering any step in an ETL workflow can break other workflows.
o Higher cost: ETL workflows require an investment in op-premise hardware.
3.
ELT…
Today, Organizations have to process a huge amount of data and if they
want to complete in the real-time economy, they have to do it as quickly as
possible. That is exactly the kind of scenario that traditional ETL can’t
handle, and why the market is shifting toward ELT.
Unlike ETL, ELT does not require databa transformation to take place before the loading process. ELT lods
raw data directly into a target data warehouse, instead of moving it to a
processing server for transformation.
The key benefits of ELT are:
o Speed, Speed and more speed: The ELT process is typically involves loading all your data
into a cloud repository and today’s cloud services allow you to quickly
transform and process massive data sets.
o Flexibility: The ELT process enables you to replicate all your raw data
on demand, as it’s needed for data analytics, BI or others systems.
o Scalability: Modern cloud warehouses, data lakes and data lakehouses give
you the scale to leverage all your raw data.
o Transparency: Data consumers can see what data is available almost as soon
as it comes in the door.
o Lower cost: Cloud-based ELT usually offer a lower total cost, given that you pay only
what you use.
4.
Key difference between ETL and ELT
Category |
ETL |
ELT |
Definition |
Data
is extracted from a source system, transformed on a secondary processing
server and loaded into a destination system |
Data
is estratec from a source system, load into a destination system and
transformed inside the destination system |
Support for Data Warehouse |
ETL
is the traditional process for transforming and integrating structured or
relational data into on-premise or cloud-based data warehouse |
ELT
is the modern process for transforming structured or unstrutured data into a
cloud-based data ware house |
Support for Data Lake/Mart/Lakehouse |
ETL
is not an appropriate process for data lakes, data marts or data lakehouses |
ELT
process is tailored to provide a data pipe line for data lakes, data marts or
data lakehouses |
Extraction |
Raw
data is extrated using API connectors |
Raw
data is extrated using API connectors |
Transformation |
Date
transformation is performed in a staging area outside of the data warehouse
and the entire data must be transformes before loading. As a result,
transforming larger data sets can take long time up front but analysis can
take place immediately once the ETL process is complete |
Data
transformation is performed on an as-needed basis in the target system
itself. As a result, the transformation step takes little time but can slow
down the querying and analysis processes if there is not sufficient
processing power |
Loading |
In
ETL, the loading step requires data to be loaded into a staging area befoes
beaing loaded into the target system. This multi-step process takes longer
than the ELT process |
In
ELT, the full data set is loaded directly into the target system. Since there
is only one step, and it only happens one time, loading in the ELT process is
faster than ETL |
Size/size of data set |
ETL
is most approproate for processing smaller, relational fata sets which
require complex transformation and have been predetermined as beaing relevant
to the analysis goals |
ELT
can handle any size or type of data and is well suited for processing both
strutured ans unstructures big data. Since the entire data sets is loaded,
analysts can choose at any time which data to transform and use fos analysis |
Implementation |
The
ETL process has been around for decades and there is a mature ecosystem of
ERL tools and experts readly available to help with implementation |
The
ELT process is a new apprach and the ecosystem of tools and experts needed to
implement it is still growing |
Speed |
ETL
is a time-intensice process; data is transformed before loading into a
destination system |
ELT
is faster by comparison; data is load directly into a desination system, and
trandformed in-parallel |
Cost-Based Transformations |
Separate
servers can create cost issues. Best for compute intensive transformations
and pre-cleaning. ETL can be cost-prohibitive for many small and medium
business |
Transformation
performed in-database, simultameous load and transform. ETL benefits from a
robust recosystem of cloud-based platforms which offer much lower costs and a
variety of plan options to store and process data |
Hardware |
The
traditional, on-premises ETL process requires expensive hardware. Newer,
cloud-based ETL solutions do not require hardware |
Given
that the ELT process is inherently cloud-based, no additional hardware is
required |
Maintenance |
Secondary
processing server adds to the maintenance burden |
With
fewer systems, the maintenance burden is reduced |
Data Output |
Strquctured
(typically) |
Strutured,
semi-strutured, unstrutured |
Data Volume |
Ideal
for small data sets with complicated transformation requirements |
Idela
for large data sets that require spedd and efficiency |