American Express – SRE-Devops

P4 – 8-12 years

29 LPA (Max 32 LPA)

Gurugram/ Bangalore – Everyday 5pm slot available

SRE JD:
•SRE Strategy and Leadership: Develop and implement a comprehensive SRE strategy aligned with the company's goals and objectives. Lead a team of SRE professionals to drive the reliability, performance, and scalability of GRC technology solutions.
•Observability and Monitoring: Establish observability practices to ensure real-time insights into system performance, availability, and customer experience. Implement monitoring tools, metrics, and dashboards to proactively identify and address potential issues.
•Production Support Optimization: Lead all aspects of the end-to-end production support process, including incident management, problem resolution, and service-level agreement (SLA) compliance. Drive continuous improvement initiatives to enhance operational effectiveness and reduce mean time to resolution (MTTR).
•GRC Customer Journeys: Collaborate with cross-functional teams to enhance customer journeys through seamless and reliable technology experiences.
•Reliability Engineering Best Practices: Promote and implement standard methodologies, including error budgeting, chaos engineering, and disaster recovery planning. Foster a culture of resilience and reliability within technology.
•Automation and Efficiency: Champion automation initiatives to streamline operational workflows, deployment processes, and incident response tasks. Leverage automation tools and orchestration to improve reliability and reduce manual intervention.
Qualifications:
•Degree or equivalent experience in Computer Science, Information Technology, or related field. Advanced certifications in SRE or related are a plus.
•Deep understanding of observability tools and methodologies, including experience with logging, monitoring, tracing, and performance analysis platforms.
•Strong leadership and people management skills, with the ability to inspire and empower successful SRE teams.
Preferred Skills:
•Knowledge of cloud-based SRE practices and experience with public cloud platforms such as Google Cloud.
•Familiarity with containerization technologies (e.g., Kubernetes, Docker) and microservices architecture.
•Demonstrated expertise in driving culture change, DevOps practices, and continuous improvement in SRE and production support functions.
Skills
DEVOPS-SITE-RELIABILITY-ENGINEERING