Job Description
The external service provider will be responsible for the following tasks:
* Design, implement, and maintain a scalable, reliable, and secure hybrid cloud ML Ops infrastructure for deploying, testing, managing, and monitoring ML models in various environments.
* Develop and maintain software applications in the areas of Natural Language Processing (NLP), Machine Learning (ML), Deep Learning (DL), and/or Artificial Intelligence (AI).
* Collaborate closely with data scientists and back-end developers to construct, test, integrate, and deploy ML models.
* Analyze performance metrics, troubleshoot issues, and ensure high availability and reliability.
* Design CI/CD pipelines, utilize orchestration solutions, and data versioning tools.
* Create automated anomaly detection systems, continuously monitor performance, and optimize ML pipelines for scalability, efficiency, and cost-effectiveness.
* Architect IT solutions in the NLP/ML/AI domains, considering master- and meta-data management concepts, and coordinate their implementation.
* Provide security studies, security assessments, and guidance on information system projects.
* Offer support and guidance to other team members on MLOps practices.
Knowledge and Skills Required:
* Excellent knowledge of managing an on-prem and/or cloud MLOps infrastructure.
* Excellent knowledge of containerization and orchestration platforms (e.g. Kubernetes, Docker, Podman, EKS, PKS).
* Good knowledge of MLflow, TensorFlow (TFX) or equivalents.
* Good knowledge of Airflow.
* Good knowledge of AWS and/or Azure.
* Good knowledge of Python.
* Good knowledge of Unix and Bash.
* Good knowledge agile software development methodologies.
* Good knowledge of infrastructure as code (Terraform, CloudFormation).
* Good knowledge of messaging services and platforms (e.g. Kafka, Redis, RabbitMQ).
* Knowledge of data security measures (knowledge of encryption mechanisms and ML security is considered a plus).
* Knowledge of NoSQL databases, such as Elasticsearch, MongoDB, Cassandra, HBase, etc.
* Knowledge of query languages, such as SQL, Hive, Pig, etc. and with information extraction.
* Experience with data analytics over big datasets, non-structured databases as well as data lakes.
* Experience with monitoring and logging tools (e.g. ELK stack, Prometheus, Grafana, OpenTelemetry, Cloudwatch).
* Experience with model testing and model validation in production environments.
* Ability to write clear and structured technical documentation.
* Excellent knowledge of on-prem or cloud solutions for data science applications.
* Ability to give business and technical presentations.
* Ability to apply high-quality standards.
* Ability to cope with fast-changing technologies.
* Very good communication skills with technical and non-technical audiences.
* Analysis and problem-solving skills.
* Capability to write clear and structured technical documents.
* Ability to participate in technical meetings and good communication skills.
Optional Certifications:
* AWS Certified Machine Learning.
* Microsoft Azure AI Engineer Associate.