Site Reliability Engineer - Monitoring

Giant Swarm GmbH

Your Job
You will be a key member of a tight-knit team who are responsible for keeping our customer’s managed observability apps operational and healthy. You’ll also have a key role in the development of the offering itself, working together with our Platform Engineers to deliver the greatest service possible.
Giant Swarm is a fast-growing open-source infrastructure management platform used by modern enterprises. Our vision is to empower developers around the world to ship great products. We are a diverse, fully remote (since 2014) and experienced team that is growing and spread across Europe and beyond - with a headquarters in Cologne.
  • Experiment by prototyping new observability solutions and adding their components to our app catalogs.
  • Proactive investigation of new open-source tools.
  • Implement robust testing of new and existing applications to provide the best possible customer experience.
  • Apply SRE best practices to monitor application performance and ensure that SLAs are adhered to.
  • Assist in debugging managed customer observability applications when something goes wrong.
  • Assist in debugging our own internal observability applications (we dogfood as much as possible).
  • Participate in the on-call support schedule.
  • You are a go-to person in case our developers need advice regarding monitoring infrastructure.
Requirements
  • You are fluent in Cloud Native observability tooling. This includes projects such as Prometheus, Prometheus Operator, Thanos/Cortex, Grafana, Loki etc.
  • You must have deep, hands-on knowledge of Kubernetes.
  • You’re comfortable debugging complex infrastructure and understanding how it interacts.
  • You’re happy troubleshooting a wide variety of issues and you’re not afraid to parse thousands of lines of logs in pursuit of an answer.
  • You have good coding skills (preferably Go, but Python or similar is fine as well)
  • You have experience with maintaining infrastructure with code and you know the pros and cons of various automation tools.
  • You automate all things by writing code. Using Bash scripts makes you sad :)
  • We are very active in the Cloud Native / Kubernetes space. If you are as well - be that by writing content, giving talks, or even active contribution - it will be a perfect addition. Contributing upstream to open source projects that we run is highly appreciated and part of daily business
About us
Every new team member changes the team. We love to learn from each other and we are looking for people who know things we don’t.
  • Becoming part of Giant Swarm means that, by extension, you also become part of the Cloud Native community. We actively contribute to upstream projects and our quarterly hackathons will give you space to work on out-of-the-box projects. Occasionally, when we, as a team, want to fully focus on one project, we scratch all meetings and routines for a certain time to better focus during our hive-sprints.
  • Continuous learning is important to us - we foster this through bi-yearly personal development talks, a budget for training/certifications/coaching as well as regular feedback talks and workshops. Our teams are cross- functional and collaboration is key.
  • Nothing crazy, but useful Basics: We don't count holidays but set a minimum number; You choose your own hard- and software; As a company that has almost, if not more, kids than employees, family-friendliness is crucial to us and paid parental leave is a no-brainer; We pay monthly perks that cover your costs for working remotely; We meet twice a year as an entire company and (if possible) see conferences as an important place to catch up with team members; We aim to be fully transparent (finance, salaries) unless it hurts people and trust you, based on this to make the best decisions
We failed in exactly describing our way to approach important company elements that can be described with ‘buzzwords’ such as agile mindset, cross-functional teams, self-organization, value of the individual or trust & teamwork. However, we truly care about them, we live them and we constantly iterate on them. Some snippets about how we do this are posted in our blog but by far not all of them.
Important note: We are not hiring job descriptions. We hire humans. :) We welcome applications from everybody, regardless ethnic or national origin, religion, gender identity, sexual orientation or age.
Subscribe Now