Job Description We are seeking a skilled Site Reliability Engineer to join our team. You will work closely with our development, SRE, and infrastructure teams to drive operational excellence, mentor colleagues, and shape standards that keep our services robust and innovative.
Your tasks will include operating, maintaining, and evolving container platforms across dev, staging, and production environments. You will apply a software engineering mindset to operations, automating everything, improving observability, and enforcing resilience.
Key responsibilities include ensuring availability, performance, scalability, and latency of critical services, implementing monitoring, observability, and incident response practices, defining non-functional and reliability requirements with product and feature teams, driving GitOps adoption, platform engineering best practices, and security-by-design solutions.
In addition, you will partner with development and product teams to enable fast, reliable delivery and contribute to and help organize the 24/7 on-call rotation.
Solid understanding of observability and monitoring tools (Prometheus, Grafana, ELK/Opensearch, DataDog, etc.)
~ Strong security-first mindset, delivering secure-by-design solutions that meet financial industry standards
~ Excellent collaboration skills, with experience aligning stakeholders across multiple teams
~ Fluent in English (minimum EU C1 level)
50% hybrid work, flexible hours, and autonomy to manage your time
~ Coaching, training, and tailored development paths for every career stage
~