Mandatory
Overall 6 years of experience with relevant 4 years of strong experience in Site Reliability Engineering
Experience in running production environment by monitoring availability and taking a holistic view of system health
Experience in Building software and systems to manage platform infrastructure and applications
Experience in Improving reliability quality and timetomarket of our suite of software solutions
Experience in Measuring and optimizing system performance with an eye toward pushing our capabilities forward getting ahead of customer needs and innovating for continual improvement
Experience in Providing primary operational support and engineering for multiple largescale distributed software applications
Experience in Gathering and analyzing metrics from operating systems as well as applications to assist in performance tuning and fault finding
Experience in partnering with development teams to improve services through rigorous testing and release procedures
Experience in system design consulting platform management and capacity planning
Experience in Creating sustainable systems and services through automation and uplifts
Experience in Balancing feature development speed and reliability with welldefined servicelevel objectives
Experience in Performing root cause analysis of production errors and resolve technical issues
Experience in Setting up entire observability process and automation by working with client stakeholders
Experience in monitoring and automations in Datadog or Dynatrace or Prometheus Grafana or AppDynamics or New Relic or any observability tools
Experience in Building executive dashboards for production monitoring
Experience in Transition from reactive monitoring to proactive monitoring and s
Experience in Automate ticket creation in ServiceNowany ITSM tool from ing monitoring system
Experience in Must have scripting knowledge in any language PythonShellPerl preferably in Python
Experience in programming with Java and or NET framework is preferred
Good to have Infra administration and troubleshooting experience
Require indepth knowledge of below DevOps tools and technologies
Planning Jira
SCM tools GIT GithubGitlabBitbucket
Build tools MavenGradleMSBuild
Testing integrations Junit Selenium Jmeter
CICD Tools JenkinsTeamCityGithubGitlab
Code Quality SonarQube
Security tools FortifyBlackduckCheckmarx AnchoreClair
Binary repos NexusJfrog artifactory
Scripting pythonPreferred Shell groovy YAML
Config management AnsibleChefPuppet
Infra Provisioning Terraform Azure Resource Manager AWS Cloud Formation Templates
Container Docker Kubernetes Openshift
Cloud AWS Azure GCP
Monitoring PrometheusGrafanaSplunkAppdynamicsCloud FlareCyberarkData DogDynatraceNew RelicSite24x7Sumo Logic
Keywords for TA Hiring Team SRE Site Reliability Reliability Engineering Observability Chaos Engineering Auto Scaling AppDynamicsDatadogDynatraceNew RelicInstanaSplunkPrometheusGrafanaSite 24x7Cloud FlareCyberArkSumologic CICD Cloud Platforms Azure AWS GCP AWS CloudWatch Azure Monitor PythonBash JavaNet ELK Terraform Ansible
Skills
DEVOPS-SITE-RELIABILITY-ENGINEERING
AddRec Solutions Pvt. Ltd. © 2024 | All Rights Reserved
MANAGED BY INFIEGRITY SOLUTIONS