Karavan, ETL pipeline management system based on Apache Spark

Mirmoeini, SeyedFarzam

Publication date

July 2021

Publisher

E.T.S. de Ingenieros Informáticos (UPM)

Abstract

One of the responsibilities of the Data Engineering Team is to make ETL pipelines to Extract the data, Transform it to the Desired information and schema, and then persist it in specific storage or provide it as an API. For example, a stream of data of user behavior that comes to the Kafka topic should be transformed to the desired schema and persist this raw data into the Data Lake of the company. Besides that, there is some stream processing that prepares the final information. Consider we want to have the number of online users visiting a specific product in an e-commerce website and we want to show the user the number of concurrent visitors on this specific page for the last 5 minutes. By growing the needs of different consumers of the ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Karavan, ETL pipeline management system based on Apache Spark

Abstract

Extracted data

Karavan, ETL pipeline management system based on Apache Spark

Abstract

Extracted data

Related items

Related items