dimanche 2 juillet 2023

ETL moved to ELT in the Cloud, What are similarities and differences

 


ETL moved to ELT, What are similarities and differences?

Author: Donatien MBADI OUM, Consultant and Advisor Oracle | AWS | Azure

 

1.    ETL and ELT Overview

The ETL (Extract, Transform and Load) and ELT (Extract, Load and Transform) acronyms both describe processes of cleaning, enriching and transforming data from a variety of sources before integrating it for use in data analytics, business intelligence and data science. Both capture, process and load data for analysis across three steps:

-            Extraction

Extraction is the first step of both ETL and ELT. This step is about collecting raw data from different sources. These could be database, files, software as a service (SaaS), Internet of things (IoT) sensors, or application events. You can collect structured, semi-structured or unstructured data at this stage.

-            Transformation

In the ETL, Transformation is the second step, while in ELT it’s the third and final step. This step focuses on changing raw data from its original structure into a format that meets the requirement of the target system where you plan to store the data analytics.

-            Load

In the ELT, Load is the second step, while in ETL it’s the third and final step. In this phase on ETL, you store data into the target database, so that reporting tools can use it directly to generate actionable reports and insights. In ELT, you still need to transform the extracted data after loading it.

For years, ETL was a perfect process to operate, given the amount of data involved and tools available for working with it, and it still works well for some use cases today. Because ETL transforms data prior to the loading stage, it’s the ideal process when a destination requires a specific data format.

But as the volume and speed of data have multiplied, as the pace of business has increased, and as new integration tools have arrived on the market, ELT has gained in popularity. When the destination is a cloud-native data warehouse like Amazon Redshift, Google BigQuery, Snowflake or Microsoft SQL Data Warehouse, ELT is the better approach.

Your decision between ETL and ELT will determine your data storage, analysis and processing. So before choosing between the two methods, it’s important to consider all factors, this includes the type of business you are running and your data needs.

2.    ETL…


ETL is a data integration methodology that extracts a raw data from sources, transforms the data on a secondary processing server, and then loads the data into a target database for analytics. The transformation stage ensures compliance with the target database’s structural requirements. You only move the data once it is transformed and ready.

*      The key benefits of ETL are:

o   Maturity: ETL is a well-supported and well-understood function

o   Faster analysis on a single, predefined use case: The dataset has already been structured and transformed, analysis can happen immediately.

o   Transparency: in some cases, understanding database lineage is more straightforward because data transformation is performed inline, as opposed to at a later date.

*      The key insufficiencies are:

o   Time-consuming: The process of transformation is driven from a set of requirements that can take weeks or months to map to. After transformation, loading the data can be almost as time-consuming.

o   Rigid: ETL doesn’t support ad-hoc workflows. All the parameters have to be mapped out and then implemented. Whenever a new cases arises, the entire ETL process has to be repeated.

o   Opaque and brittle: the transformation process is not transparent, since only the people transforming the data access to it. Since there are often precedent dependencies, altering any step in an ETL workflow can break other workflows.

o   Higher cost: ETL workflows require an investment in op-premise hardware.

 

 

 

 

 

 

3.    ELT…


Today, Organizations have to process a huge amount of data and if they want to complete in the real-time economy, they have to do it as quickly as possible. That is exactly the kind of scenario that traditional ETL can’t handle, and why the market is shifting toward ELT.

Unlike ETL, ELT does not require databa transformation to take  place before the loading process. ELT lods raw data directly into a target data warehouse, instead of moving it to a processing server for transformation.

*      The key benefits of ELT are:

o   Speed, Speed and more speed: The ELT process is typically involves loading all your data into a cloud repository and today’s cloud services allow you to quickly transform and process massive data sets.

o   Flexibility: The ELT process enables you to replicate all your raw data on demand, as it’s needed for data analytics, BI or others systems.

o   Scalability: Modern cloud warehouses, data lakes and data lakehouses give you the scale to leverage all your raw data.

o   Transparency: Data consumers can see what data is available almost as soon as it comes in the door.

o   Lower cost: Cloud-based ELT usually offer a lower total cost, given that you pay only what you use.

 

4.    Key difference between ETL and ELT

 

Category

ETL

ELT

Definition

Data is extracted from a source system, transformed on a secondary processing server and loaded into a destination system

Data is estratec from a source system, load into a destination system and transformed inside the destination system

Support for Data Warehouse

ETL is the traditional process for transforming and integrating structured or relational data into on-premise or cloud-based data warehouse

ELT is the modern process for transforming structured or unstrutured data into a cloud-based data ware house

Support for Data Lake/Mart/Lakehouse

ETL is not an appropriate process for data lakes, data marts or data lakehouses

ELT process is tailored to provide a data pipe line for data lakes, data marts or data lakehouses

Extraction

Raw data is extrated using API connectors

Raw data is extrated using API connectors

Transformation

Date transformation is performed in a staging area outside of the data warehouse and the entire data must be transformes before loading. As a result, transforming larger data sets can take long time up front but analysis can take place immediately once the ETL process is complete

Data transformation is performed on an as-needed basis in the target system itself. As a result, the transformation step takes little time but can slow down the querying and analysis processes if there is not sufficient processing power

Loading

In ETL, the loading step requires data to be loaded into a staging area befoes beaing loaded into the target system. This multi-step process takes longer than the ELT process

In ELT, the full data set is loaded directly into the target system. Since there is only one step, and it only happens one time, loading in the ELT process is faster than ETL

Size/size of data set

ETL is most approproate for processing smaller, relational fata sets which require complex transformation and have been predetermined as beaing relevant to the analysis goals

ELT can handle any size or type of data and is well suited for processing both strutured ans unstructures big data. Since the entire data sets is loaded, analysts can choose at any time which data to transform and use fos analysis

Implementation

The ETL process has been around for decades and there is a mature ecosystem of ERL tools and experts readly available to help with implementation

The ELT process is a new apprach and the ecosystem of tools and experts needed to implement it is still growing

Speed

ETL is a time-intensice process; data is transformed before loading into a destination system

ELT is faster by comparison; data is load directly into a desination system, and trandformed in-parallel

Cost-Based Transformations

Separate servers can create cost issues. Best for compute intensive transformations and pre-cleaning. ETL can be cost-prohibitive for many small and medium business

Transformation performed in-database, simultameous load and transform. ETL benefits from a robust recosystem of cloud-based platforms which offer much lower costs and a variety of plan options to store and process data

Hardware

The traditional, on-premises ETL process requires expensive hardware. Newer, cloud-based ETL solutions do not require hardware

Given that the ELT process is inherently cloud-based, no additional hardware is required

Maintenance

Secondary processing server adds to the maintenance burden

With fewer systems, the maintenance burden is reduced

Data Output

Strquctured (typically)

Strutured, semi-strutured, unstrutured

Data Volume

Ideal for small data sets with complicated transformation requirements

Idela for large data sets that require spedd and efficiency

 

How to fix errors : -         ORA-38760: This database instance failed to turn on flashback database -         ORA-38780: Restore poin...