MAIN TASKS
* Development and maintenance of a fully open-source Data Lakehouse.
* Design and development of data pipelines for scalable and reliable data workflows to transform extensive quantities of both structured and unstructured data.
* Data integration from various sources, including databases, APIs, data streaming services and cloud data platforms.
* Optimisation of queries and workflows for increased performance and enhanced efficiency.
* Writing modular, testable and production-grade code.
* Ensuring data quality through monitoring, validation and data quality checks, maintaining accuracy and consistency across the data platform.
* Elaboration of test programs.
* Document processes comprehensively to ensure seamless data pipeline management and troubleshooting.
* Assistance with deployment and configuration of the system.
EXPERTISE
* Extensive hands-on experience as Data Engineer or Data Architect in modern cloud-based open-source data platform solutions and on data analytics tools.
* Excellent knowledge of data warehouse and/or data lakehouse design & architecture.
* Excellent knowledge of open-source, code-based data transformation tools such as dbt, Spark and Trino.
* Excellent knowledge of SQL.
* Good knowledge of Python.
* Good knowledge of open-source orchestration tools such as Airflow, Dagster or Luigi.
* Experience with AI-powered assistants like Amazon Q that can streamline data engineering processes.
* Good knowledge of relational database systems.
* Good knowledge of event streaming platforms and message brokers like Kafka and RabbitMQ.
* Extensive experience in creating end-to-end data pipelines and the ELT framework.
* Understanding of the principles behind storage protocols like Apache Iceberg or Delta Lake.
* Proficiency with Kubernetes and Docker/Podman.
* Good knowledge of data modelling tools.
* Good knowledge of online analytical data processing (OLAP) and data mining tools.
* Ability to participate in multilingual meetings
* Ability to work with a high degree of rigour and method and, more specifically, to follow naming conventions and coding standards.