Role Overview
We are hiring a Linux infrastructure specialist to join a core High-Performance Computing (HPC) team. You will support and evolve simulation and R&D compute platforms used by internal Engineering teams.
This is a technically broad role where you’ll touch on Linux infrastructure, cluster orchestration, automation, monitoring, and some hands-on hardware support. You'll join a small, senior team with high autonomy and end-to-end responsibility over the HPC platform.
Key Responsibilities
* Administer and optimize Linux-based HPC clusters (Ubuntu, CentOS, RHEL-family)
* Manage workload scheduling with Slurm
* Support containerized workloads using Docker and Singularity
* Implement and manage infrastructure-as-code via Ansible and Terraform
* Support GPU-accelerated workloads (NVIDIA, CUDA)
* Monitor system health and performance using Grafana, Prometheus, and related tools
* Troubleshoot hardware and perform physical support tasks (rack/stack, diagnostics, cabling)
* Collaborate with internal researchers and engineers to support and improve workload performance
* Contribute to documentation and help mature internal platform standards and practices
Requirements
- Operating Systems: Ubuntu, CentOS, RHEL derivatives (Rocky, Alma)
- Schedulers: Slurm (primary), OpenOnDemand (optional)
- Containers: Docker, Singularity
- Automation: Ansible, Terraform, Bash, Python
- Monitoring: Grafana, Prometheus, custom metrics
- HPC Filesystems: Lustre (required), GPFS, Ceph (optional)
- Hardware: Server maintenance, rack/stack, troubleshooting
- Collaboration: Git, Jira, CI/CD pipelines
Ideal Candidate Profile
* 5+ years of Linux system administration experience, including in performance-sensitive environments
* Experience supporting or operating HPC clusters (Slurm, Lustre)
* Scripting ability in Bash and Python
* Hands-on automation experience with Ansible and Terraform (or equivalents)
* Familiarity with containerization and job isolation (Docker/Singularity)
* Comfortable with infrastructure observability tools and performance tuning
* Proactive, autonomous, and able to collaborate across teams and functions
* Fluent in English (spoken and written)
Nice to Have
* Experience with Bright Cluster Manager or other cluster deployment tools
* Exposure to distributed file systems (e.g., Ceph)
* Familiarity with OpenOnDemand or other HPC frontend tools
* Understanding of GPU scheduling (CUDA/NVIDIA)
* Cloud exposure (AWS, Azure, or GCP)
Benefits
While preferably we are looking for a Full-Time Employee (FTE), exceptions can be made for the right candidate if they would rather work as a freelancer (Contractor).
Here is a list of benefits:
- Meal vouchers
- Pension scheme (2%)
- Hospitalization Insurance
- Remote work allowance (60€/month)