Senior Engineer - Site Reliability Engineering

Date: Oct 13, 2025

Location:

Bangalore, KA, IN, 560100

Employment Type:

SRE Staff Engineer:

Experience Level: 5-8 years
Location: Onsite Client (Bangalore)
Nature of work: Hybrid
Shift: 24x5 Rotational Shifts

About the Role:

We are seeking an experienced and motivated Senior Engineer who can work and Individual Contributor to drive the performance of Observability Team (PMG). The ideal candidate will have a strong technical background in Site Reliability Engineering (SRE) with expertise in observability, alerting frameworks, incident management, and automation.

Key Responsibilities:

1. Payment Monitoring and Alert Triage:

Monitoring of the Payments Flow Based Alerts across multiple applications in rotation 24 X 7 shifts and identify the issue proactively.
Triage the alerts by analysing the trends on affected dimensions of payment flow, and co-relate the same with other services metrics, logs and traces to find the root cause along with the documentation of triage.
Ensure timely escalation and closure of issues reported while working with Engineering Teams of payment Services.

2. Observability Development:

Design and implement alerting frameworks using tools like Datadog, Grafana, Kibana, Splunk, and Prometheus.
Set up custom dashboards and streamline alerting to reduce noise while ensuring critical issues are addressed.
Drive the adoption of SLO-based alerting, burn rate metrics, and anomaly detection techniques.

3. Incident Management:

Lead incident management efforts from identification to resolution.
Conduct post-incident reviews and implement preventive measures to avoid recurring issues.
Maintain detailed documentation and performance reports on incident trends and team efficiency.

4. Automation and Optimization:

Automate repetitive processes using programming languages like Python or Java.
Develop and refine scripts to manage and fine-tune alerts.
Collaborate with engineering teams to implement solutions that reduce manual effort and operational toil.

Required Skills and Qualifications:

Proven expertise in SRE Observability Concepts and monitoring architecture design.
Extensive experience with alerting frameworks like Prometheus, Grafana, Kibana, Splunk, and Datadog.
Hands-on experience with alert noise reduction and advanced alerting techniques such as anomaly detection and burn rate alerting.
Strong proficiency in incident management, including analysis, root cause identification, and preventive measures.
Familiarity with payment monitoring systems and operational requirements.
Proficient in automation tools and scripting languages like Python or Java.
Excellent collaboration and communication skills to interact with cross-functional teams.
Flexibility to work in rotational 24x7 shifts from the office.

Required Skills and Qualifications:

Proven expertise in SRE Observability Concepts and monitoring architecture design.
Extensive experience with alerting frameworks like Prometheus, Grafana, Kibana, Splunk, and Datadog.
Hands-on experience with alert noise reduction and advanced alerting techniques such as anomaly detection and burn rate alerting.
Strong proficiency in incident management, including analysis, root cause identification, and preventive measures.
Familiarity with payment monitoring systems and operational requirements.
Proficient in automation tools and scripting languages like Python or Java.
Excellent collaboration and communication skills to interact with cross-functional teams.
Flexibility to work in rotational 24x7 shifts from the office.

SRE Staff Engineer:

Experience Level: 5-8 years
Location: Onsite Client (Bangalore)
Nature of work: Hybrid
Shift: 24x5 Rotational Shifts

About the Role:

We are seeking an experienced and motivated Senior Engineer who can work and Individual Contributor to drive the performance of Observability Team (PMG). The ideal candidate will have a strong technical background in Site Reliability Engineering (SRE) with expertise in observability, alerting frameworks, incident management, and automation.

Key Responsibilities:

1. Payment Monitoring and Alert Triage:

Monitoring of the Payments Flow Based Alerts across multiple applications in rotation 24 X 7 shifts and identify the issue proactively.
Triage the alerts by analysing the trends on affected dimensions of payment flow, and co-relate the same with other services metrics, logs and traces to find the root cause along with the documentation of triage.
Ensure timely escalation and closure of issues reported while working with Engineering Teams of payment Services.

2. Observability Development:

Design and implement alerting frameworks using tools like Datadog, Grafana, Kibana, Splunk, and Prometheus.
Set up custom dashboards and streamline alerting to reduce noise while ensuring critical issues are addressed.
Drive the adoption of SLO-based alerting, burn rate metrics, and anomaly detection techniques.

3. Incident Management:

Lead incident management efforts from identification to resolution.
Conduct post-incident reviews and implement preventive measures to avoid recurring issues.
Maintain detailed documentation and performance reports on incident trends and team efficiency.

4. Automation and Optimization:

Automate repetitive processes using programming languages like Python or Java.
Develop and refine scripts to manage and fine-tune alerts.
Collaborate with engineering teams to implement solutions that reduce manual effort and operational toil.

Required Skills and Qualifications:

Proven expertise in SRE Observability Concepts and monitoring architecture design.
Extensive experience with alerting frameworks like Prometheus, Grafana, Kibana, Splunk, and Datadog.
Hands-on experience with alert noise reduction and advanced alerting techniques such as anomaly detection and burn rate alerting.
Strong proficiency in incident management, including analysis, root cause identification, and preventive measures.
Familiarity with payment monitoring systems and operational requirements.
Proficient in automation tools and scripting languages like Python or Java.
Excellent collaboration and communication skills to interact with cross-functional teams.
Flexibility to work in rotational 24x7 shifts from the office.

BE.B.TECH NA

Senior Engineer - Site Reliability Engineering

Education and Experience Required