One of the responsibilities of the Data Engineering Team is to make ETL pipelines to Extract the data, Transform it to the Desired information and schema, and then persist it in specific storage or provide it as an API. For example, a stream of data of user behavior that comes to the Kafka topic should be transformed to the desired schema and persist this raw data into the Data Lake of the company. Besides that, there is some stream processing that prepares the final information. Consider we want to have the number of online users visiting a specific product in an e-commerce website and we want to show the user the number of concurrent visitors on this specific page for the last 5 minutes. By growing the needs of different consumers of the ...
Incorporating a metadata layer, and a Data Lake Datasets composed of discrete objects, like image co...
This master's thesis deals with Big data processing in distributed system Apache Spark using tools, ...
PIPES is a flexible and extensible infrastructure providing fundamental building blocks to implement...
One of the responsibilities of the Data Engineering Team is to make ETL pipelines to Extract the dat...
Apache Spark is an execution engine that besides working as an isolated distributed, in-memory compu...
Nowadays, the amount of data generated by users within an Internet product is increasing exponential...
Many fields have a need to process and analyze data streams in real-time. In industrial applications...
Distributed data processing systems are the standard means for large-scale data analysis in the Big ...
In tertiary institutions, different set of information are derived from the various department and o...
This Paper addresses the use of Apache Airflow in creating Data Pipelines, the paper gives an overvi...
This talk is about sharing our recent experiences in providing data analytics platform based on Apac...
There has been a major leap in the field of inventory network for the executives. With every one of ...
Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements...
This project collects a process of decision making at Innovamat, an EdTech company from Barcelona. T...
ETL (Extraction-Transform-Load) tools, traditionally developed to operate offline on historical data...
Incorporating a metadata layer, and a Data Lake Datasets composed of discrete objects, like image co...
This master's thesis deals with Big data processing in distributed system Apache Spark using tools, ...
PIPES is a flexible and extensible infrastructure providing fundamental building blocks to implement...
One of the responsibilities of the Data Engineering Team is to make ETL pipelines to Extract the dat...
Apache Spark is an execution engine that besides working as an isolated distributed, in-memory compu...
Nowadays, the amount of data generated by users within an Internet product is increasing exponential...
Many fields have a need to process and analyze data streams in real-time. In industrial applications...
Distributed data processing systems are the standard means for large-scale data analysis in the Big ...
In tertiary institutions, different set of information are derived from the various department and o...
This Paper addresses the use of Apache Airflow in creating Data Pipelines, the paper gives an overvi...
This talk is about sharing our recent experiences in providing data analytics platform based on Apac...
There has been a major leap in the field of inventory network for the executives. With every one of ...
Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements...
This project collects a process of decision making at Innovamat, an EdTech company from Barcelona. T...
ETL (Extraction-Transform-Load) tools, traditionally developed to operate offline on historical data...
Incorporating a metadata layer, and a Data Lake Datasets composed of discrete objects, like image co...
This master's thesis deals with Big data processing in distributed system Apache Spark using tools, ...
PIPES is a flexible and extensible infrastructure providing fundamental building blocks to implement...