Senior Engineer - Site Reliability Engineering

Date:  Oct 13, 2025
Location: 

Bangalore, KA, IN, 560100

Employment Type: 

SRE Staff Engineer: 



  • Experience Level: 5-8 years 
  • Location: Onsite Client (Bangalore) 
  • Nature of work: Hybrid  
  • Shift: 24x5 Rotational Shifts 

 

About the Role: 

We are seeking an experienced and motivated Senior Engineer who can work and Individual Contributor to drive the performance of Observability Team (PMG). The ideal candidate will have a strong technical background in Site Reliability Engineering (SRE) with expertise in observability, alerting frameworks, incident management, and automation.  

 

Key Responsibilities: 

1. Payment Monitoring and Alert Triage: 



  • Monitoring of the Payments Flow Based Alerts across multiple applications in rotation 24 X 7 shifts and identify the issue proactively. 
  • Triage the alerts by analysing the trends on affected dimensions of payment flow, and co-relate the same with other services metrics, logs and traces to find the root cause along with the documentation of triage. 
  • Ensure timely escalation and closure of issues reported while working with Engineering Teams of payment Services. 

2. Observability Development: 



  • Design and implement alerting frameworks using tools like Datadog, Grafana, Kibana, Splunk, and Prometheus
  • Set up custom dashboards and streamline alerting to reduce noise while ensuring critical issues are addressed. 
  • Drive the adoption of SLO-based alerting, burn rate metrics, and anomaly detection techniques. 

3. Incident Management: 



  • Lead incident management efforts from identification to resolution. 
  • Conduct post-incident reviews and implement preventive measures to avoid recurring issues. 
  • Maintain detailed documentation and performance reports on incident trends and team efficiency. 

4. Automation and Optimization: 



  • Automate repetitive processes using programming languages like Python or Java. 
  • Develop and refine scripts to manage and fine-tune alerts. 
  • Collaborate with engineering teams to implement solutions that reduce manual effort and operational toil. 

 

Required Skills and Qualifications: 



  • Proven expertise in SRE Observability Concepts and monitoring architecture design. 
  • Extensive experience with alerting frameworks like Prometheus, Grafana, Kibana, Splunk, and Datadog. 
  • Hands-on experience with alert noise reduction and advanced alerting techniques such as anomaly detection and burn rate alerting. 
  • Strong proficiency in incident management, including analysis, root cause identification, and preventive measures. 
  • Familiarity with payment monitoring systems and operational requirements. 
  • Proficient in automation tools and scripting languages like Python or Java. 
  • Excellent collaboration and communication skills to interact with cross-functional teams. 
  • Flexibility to work in rotational 24x7 shifts from the office. 

 



 

Required Skills and Qualifications: 



  • Proven expertise in SRE Observability Concepts and monitoring architecture design. 
  • Extensive experience with alerting frameworks like Prometheus, Grafana, Kibana, Splunk, and Datadog. 
  • Hands-on experience with alert noise reduction and advanced alerting techniques such as anomaly detection and burn rate alerting. 
  • Strong proficiency in incident management, including analysis, root cause identification, and preventive measures. 
  • Familiarity with payment monitoring systems and operational requirements. 
  • Proficient in automation tools and scripting languages like Python or Java. 
  • Excellent collaboration and communication skills to interact with cross-functional teams. 
  • Flexibility to work in rotational 24x7 shifts from the office. 

 

SRE Staff Engineer: 



  • Experience Level: 5-8 years 
  • Location: Onsite Client (Bangalore) 
  • Nature of work: Hybrid  
  • Shift: 24x5 Rotational Shifts 

 

About the Role: 

We are seeking an experienced and motivated Senior Engineer who can work and Individual Contributor to drive the performance of Observability Team (PMG). The ideal candidate will have a strong technical background in Site Reliability Engineering (SRE) with expertise in observability, alerting frameworks, incident management, and automation.  

 

Key Responsibilities: 

1. Payment Monitoring and Alert Triage: 



  • Monitoring of the Payments Flow Based Alerts across multiple applications in rotation 24 X 7 shifts and identify the issue proactively. 
  • Triage the alerts by analysing the trends on affected dimensions of payment flow, and co-relate the same with other services metrics, logs and traces to find the root cause along with the documentation of triage. 
  • Ensure timely escalation and closure of issues reported while working with Engineering Teams of payment Services. 

2. Observability Development: 



  • Design and implement alerting frameworks using tools like Datadog, Grafana, Kibana, Splunk, and Prometheus
  • Set up custom dashboards and streamline alerting to reduce noise while ensuring critical issues are addressed. 
  • Drive the adoption of SLO-based alerting, burn rate metrics, and anomaly detection techniques. 

3. Incident Management: 



  • Lead incident management efforts from identification to resolution. 
  • Conduct post-incident reviews and implement preventive measures to avoid recurring issues. 
  • Maintain detailed documentation and performance reports on incident trends and team efficiency. 

4. Automation and Optimization: 



  • Automate repetitive processes using programming languages like Python or Java. 
  • Develop and refine scripts to manage and fine-tune alerts. 
  • Collaborate with engineering teams to implement solutions that reduce manual effort and operational toil. 

 

Required Skills and Qualifications: 



  • Proven expertise in SRE Observability Concepts and monitoring architecture design. 
  • Extensive experience with alerting frameworks like Prometheus, Grafana, Kibana, Splunk, and Datadog. 
  • Hands-on experience with alert noise reduction and advanced alerting techniques such as anomaly detection and burn rate alerting. 
  • Strong proficiency in incident management, including analysis, root cause identification, and preventive measures. 
  • Familiarity with payment monitoring systems and operational requirements. 
  • Proficient in automation tools and scripting languages like Python or Java. 
  • Excellent collaboration and communication skills to interact with cross-functional teams. 
  • Flexibility to work in rotational 24x7 shifts from the office. 

 

 

BE.B.TECH NA

Education and Experience Required