Job Description
Summary
Our client is seeking a Site Reliability Engineer to support a rapidly growing SaaS environment. This role focuses on maintaining system uptime, improving deployment reliability, automating infrastructure operations, and partnering closely with development teams to ensure secure, high-performance software delivery.
Key Responsibilities
- Maintain service uptime and restore services quickly during incidents
- Deploy new builds into production environments
- Monitor application performance and system health
- Collaborate with internal teams to ensure security, SLA, and performance standards are met
- Implement automation (IaC) for monitoring, deployment, and system validation
- Provide on-call support and manage critical issue resolution
- Partner with internal development and outsourced teams on systems architecture
- Triage alerts, diagnose issues, and coordinate resolutions
- Lead capacity planning efforts
- Develop CI/CD orchestration systems to streamline delivery
- Define, execute, and evaluate Operational Acceptance Testing
- Create and maintain runbooks, playbooks, and operational documentation
- Automate infrastructure, testing, failover, and failure mitigation processes
Required Qualifications
- 5+ years of Site Reliability Engineering experience
- Strong debugging and troubleshooting abilities
- Experience with observability and APM tools (e.g., Datadog)
- Cloud experience (AWS preferred)
- Strong scripting skills (Python, shell scripting, etc.)
- Hands-on experience with Git and CI/CD platforms (GitHub, Jenkins, etc.)
- Experience with SOA and microservices architectures
- Background with scalable, high-performance enterprise systems
Tech Stack
- APM/Observability (Datadog)
- AWS
- Python / Shell
- Git, Jenkins, GitHub
- Microservices / SOA
- Virtualization & Containers (VMware, Kubernetes)
- IaC Tools (Terraform, Ansible, SaltStack)
Compensation & Benefits
Competitive compensation aligned with market standards.
Health benefits, PTO, remote flexibility (varies by client).
Work Schedule
Monday–Friday, flexible hours, occasional on-call.
About the Client
Our client is a growing technology-focused organization committed to building scalable digital solutions.
Why Join the Team?
- Work with cutting-edge cloud and automation technologies
- Contribute to high-impact reliability and performance initiatives
- Collaborate with engineering teams in a fast-growing SaaS environment
- Opportunity to drive meaningful improvements in system resilience
How to Apply
Submit your resume to people@ignitetalentpartners.com. Shortlisted candidates will be contacted.