Oncall DevOps/ SRE for Big Data infrastructure

Ahrefs

What We Need

Ahrefs is looking for a Site Reliability Engineer to help take care of its distributed crawler powered by 2,000 servers and ensure all systems are up and running 24/7. If you possess a healthy desire to automate everything while being able to quickly resolve urgent issues manually, then we want you! We strive to keep humans away from doing repetitive jobs that can be done by computers and focus instead on foreseeing problems and defining programmatic means to handle them.

Our system is big part custom OCaml code and also employs third-party technologies - Debian, ELK, Puppet, Clickhouse, and anything else that will solve the task at hand. In this role, be prepared to deal with 25 petabytes storage cluster, 2,000 baremetal servers, experimental large-scale deployments and all kinds of software bugs and hardware deviations on a daily basis.

Basic Requirements:

Deep understanding of operating systems and networks fundamentals
Practical knowledge of Linux userspace and kernel internals

The ideal candidate is expected to:

Understand the whole technology stack at all levels: from network and user-space code to OS internals and hardware
Independently deal with and investigate infrastructure issues on live production systems including dealing with hardware problems and interact with datacenters
Develop internal automation - monitoring, setup, statistics
Have the ability to foresee potential problems and prevent them from happening. Apply first-aid reaction to infrastructure failures when necessary
Help developers with deployment and integration
Participate in on-call rotation
Make well-reasoned technical choices and take responsibility for it
Approach problems with a practical mindset and suppress perfectionism when time is a priority
Setup automatic systems to control infrastructure
Possess a healthy detestation for complex shell scripts

Category DevOps

reliabilityengineerdistributedsystemlinuxinfrastructure

Oncall DevOps/ SRE for Big Data infrastructure

Ahrefs

Similar Jobs

VEXXHOST, Inc.

Remote OpenStack Cloud Engineer (DevOps)

CareRev

Remote Sr. Staff, DevOps Engineer (Platform Team)

CoverGo

Remote Senior DevOps Engineer

VEXXHOST, Inc.

Remote OpenStack Cloud Engineer (DevOps)

FreeWill

Remote Senior DevOps Engineer

Power Diary

Remote Senior DevOps Engineer (work-from-home)

Oncall DevOps/ SRE for Big Data infrastructure

Ahrefs

Similar Jobs

VEXXHOST, Inc.

Remote OpenStack Cloud Engineer (DevOps)

CareRev

Remote Sr. Staff, DevOps Engineer (Platform Team)

CoverGo

Remote Senior DevOps Engineer

VEXXHOST, Inc.

Remote OpenStack Cloud Engineer (DevOps)

FreeWill

Remote Senior DevOps Engineer

Power Diary

Remote Senior DevOps Engineer (work-from-home)

Job Alerts