Following tasks will be performed by external service provider:
• Development and maintenance of a fully open-source Data Lakehouse.
• Design and development of data pipelines for scalable and reliable data workflows to transform extensive quantities of both structured and unstructured data.
• Data integration from various sources, including databases, APIs, data streaming services and cloud data platforms.
• Optimization of queries and workflows for increased performance and enhanced efficiency.
• Writing modular, testable and production-grade code.
• Ensuring data quality through monitoring, validation and data quality checks, maintaining accuracy and consistency across the data platform.
• Elaboration of test programs.
• Document processes comprehensively to ensure seamless data pipeline management and troubleshooting.
• Assistance with deployment and configuration of the system.
• Participation in meetings with other project teams.
LEVEL OF EDUCATION
As stated in the Article 2.6.3.1. of DIGIT-TM II Service requirements, a minimum educational qualification for lot 2 is: Level of education corresponding to Level 6 of the European Qualification Framework which typically corresponds to a bachelor’s degree of 3 years.
KNOWLEDGE AND SKILLS
Following skills and knowledge are required for the performance of the above listed tasks:
• Extensive hands-on experience as Data Engineer or Data Architect in modern cloud-based open-source data platform solutions and on data analytics tools.
• Excellent knowledge of data warehouse and/or data lakehouse design & architecture.
• Excellent knowledge of open-source, code-based data transformation tools such as dbt, Spark and Trino.
• Excellent knowledge of SQL.
• Good knowledge of Python.
• Good knowledge of open-source orchestration tools such as Airflow, Dagster or Luigi.
• Experience with AI-powered assistants like Amazon Q that can streamline data engineering processes.
• Good knowledge of relational database systems.
• Good knowledge of event streaming platforms and message brokers like Kafka and RabbitMQ.
• Extensive experience in creating end-to-end data pipelines and the ELT framework.
• Understanding of the principles behind storage protocols like Apache Iceberg or Delta Lake.
• Proficiency with Kubernetes and Docker/Podman.
• Good knowledge of data modelling tools.
• Good knowledge of online analytical data processing (OLAP) and data mining tools.
• Ability to participate in multilingual meetings
• Ability to work with a high degree of rigour and method and, more specifically, to follow naming conventions and coding standards.
SPECIFIC EXPERTISE
Following specific expertise is mandatory for the performance of tasks: NA.
CERTIFICATIONS & STANDARDS:
Following certificates & standards are required for the performance of tasks: NA.