Technical Lead, Reliability Engineering
About the team
The HubSpot Product team is made up of over 700 engineers, designers, product managers, and researchers. We’re passionate about building tools that help small and medium-sized businesses market, sell, and serve their customers — and ultimately, grow better.
Those tools end up in the HubSpot application platform, which itself is made up of thousands of services, workers, and jobs spanning over 170 teams and thousands of repos. Our teams work autonomously to deploy these systems across a common infrastructure, up to 3,000 times a day. As we’ve grown to serve over 75,000 customers in 100 countries, reliability and stability have become just as important as speed and time to market. And as we’ve opened up our APIs, our product has moved to the core of many of our customers’ and partners’ businesses.
In 2019, we built an SRE team to help our product teams focus on delivering highly available and dependable products. This team is off to a great start: evangelizing, building tools, and embedding onto product teams. We are looking to grow this team by hiring engineers with an interest in reliability and scale. This is an opportunity to work on hard problems across a variety of domains with an experienced team of software engineers.
What you’ll do
- Help product and infrastructure teams hold retroactive root cause analysis meetings, focusing on identifying remediations using a blameless process similar to the 5 whys methodology
- Embed on product and infrastructure teams directly to build more reliable, scalable software
- Conceive, design, and build infrastructure tooling that improves reliability across the entire product surface area, dealing with massive distributed scale
- Evangelize best practices around reliability engineering
- Proactively identify risks and advocate for engineering process, tooling, or work streams that reduce that risk in a customer centric way
What we’re looking for
- Experience designing and operating distributed systems at scale
- Experience with improving reliability through better automated systems, configuration, chaos testing, and process improvement
- Experience working collaboratively with other engineering teams
- Interest or experience implementing and iterating on process to improve outcomes with minimal disruption to team culture
- Interest or experience working across multiple stakeholders to drive effective change