Site Reliability Engineer

Apply for this job

Job Description

Summary

Our client is seeking a Site Reliability Engineer to support a rapidly growing SaaS environment. This role focuses on maintaining system uptime, improving deployment reliability, automating infrastructure operations, and partnering closely with development teams to ensure secure, high-performance software delivery.

Key Responsibilities

  • Maintain service uptime and restore services quickly during incidents
  • Deploy new builds into production environments
  • Monitor application performance and system health
  • Collaborate with internal teams to ensure security, SLA, and performance standards are met
  • Implement automation (IaC) for monitoring, deployment, and system validation
  • Provide on-call support and manage critical issue resolution
  • Partner with internal development and outsourced teams on systems architecture
  • Triage alerts, diagnose issues, and coordinate resolutions
  • Lead capacity planning efforts
  • Develop CI/CD orchestration systems to streamline delivery
  • Define, execute, and evaluate Operational Acceptance Testing
  • Create and maintain runbooks, playbooks, and operational documentation
  • Automate infrastructure, testing, failover, and failure mitigation processes

Required Qualifications

  • 5+ years of Site Reliability Engineering experience
  • Strong debugging and troubleshooting abilities
  • Experience with observability and APM tools (e.g., Datadog)
  • Cloud experience (AWS preferred)
  • Strong scripting skills (Python, shell scripting, etc.)
  • Hands-on experience with Git and CI/CD platforms (GitHub, Jenkins, etc.)
  • Experience with SOA and microservices architectures
  • Background with scalable, high-performance enterprise systems

Tech Stack

  • APM/Observability (Datadog)
  • AWS
  • Python / Shell
  • Git, Jenkins, GitHub
  • Microservices / SOA
  • Virtualization & Containers (VMware, Kubernetes)
  • IaC Tools (Terraform, Ansible, SaltStack)

Compensation & Benefits

Competitive compensation aligned with market standards.
Health benefits, PTO, remote flexibility (varies by client).

Work Schedule

Monday–Friday, flexible hours, occasional on-call.

About the Client

Our client is a growing technology-focused organization committed to building scalable digital solutions.

Why Join the Team?

  • Work with cutting-edge cloud and automation technologies
  • Contribute to high-impact reliability and performance initiatives
  • Collaborate with engineering teams in a fast-growing SaaS environment
  • Opportunity to drive meaningful improvements in system resilience

How to Apply

Submit your resume to people@ignitetalentpartners.com. Shortlisted candidates will be contacted.