Tasks:
* Development and maintenance of a fully open-source Data Lakehouse;
* Design and development of data pipelines for scalable and reliable data workflows to transform extensive quantities of both structured and unstructured data;
* Data integration from various sources, including databases, APIs, data streaming services and cloud data platforms;
* Optimization of queries and workflows for increased performance and enhanced efficiency;
* Writing modular, testable and production-grade code;
* Ensuring data quality through monitoring, validation and data quality checks, maintaining accuracy and consistency across the data platform;
* Elaboration of test programs;
* Document processes comprehensively to ensure seamless data pipeline management and troubleshooting;
* Assistance with deployment and configuration of the system;
Requirements:
* University degree in IT or relevant discipline, combined with minimum 13 years of relevant working experience in IT;
* Hands-on experience as Data Engineer or Data Architect in modern cloud-based open-source data platform solutions and on data analytics tools;
* Experience in data warehouse and/or data lake house design & architecture;
* Experience with AI-powered assistants like Amazon Q that can streamline data engineering processes;
* Experience in creating end-to-end data pipelines and the ELT framework;
* Excellent knowledge in SQL, relational databases, and open-source, code-based data transformation tools such as dbt, Spark and Trino;
* Good knowledge of Python and open-source orchestration tools such as Airflow, Dagster or Luigi;
* Good knowledge of data modelling tools and online analytical data processing (OLAP) and data mining tools;
* Good knowledge of event streaming platforms and message brokers like Kafka and RabbitMQ;
* Understanding of the principles behind storage protocols like Apache Iceberg or Delta Lake;
* Proficiency with Kubernetes and Docker/Podman;
* Excellent command of the English language.