System Engineer - CCC

Vue: 111

Jour de mise à jour: 06-05-2024

Catégorie: Architecture / Design dintérieur Haute technologie Mécanique / Technique Pharmaceutique / Chimique / Biotechnologie Entretien Direction IT - Matériel / Réseaux Informatique Servir / Nettoyant / Servantes PG / PB / Réception un événement

Industrie:

Loading ...

le contenu du travail

As a Site Reliability Operations Engineer within the Global Technical Engineering Operations (GTEO) SRC team you will work with other SRC, TDO, SRE, DevOps and Engineering practitioners to pro-actively maintain mission-critical infrastructure, cloud platforms, micro-services, tools, and processes that will ensure highest levels of availability and reliability of all our websites.

You’re right for the job if you are comfortable contributing to major incident response in technical team of engineer’s laser focused on restoring service across complex distributed architectures. You’ll excel if you have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization and organization. You will work directly with our SRE, Engineering and DevOps teams to support our next generation “always up” cloud-based e-commerce platform.

The SRC Site Reliability Operations Engineer is responsible for pro-actively monitoring, detecting and resolving site issues before they become customer and availability impacting. Technically you will understand the full end to end stack and use this knowledge to detect error/failures and take corrective action to mitigate. During a major incident, you will draw on your technical skills and knowledge to triage, differentiating between symptom and cause, to help restore impacting issues. Your ability to continuously challenge yourself and develop a strong network within your peer group will see you exceed in this role. Our goal is to protect the customer experience and deliver outstanding levels of availability. To do so, you will need strong skills in the following areas:

Understanding of incident management processes and procedures.

Calm under pressure when participating in major incident response.

Technical understanding of core infrastructure, cloud services, platforms and micro-services.

Ability to understand and capture key data from logs.

Ability to understand traffics flows and key dependencies between services.

Ability to effectively triage – be able to detect and determine symptom vs cause.

Detect and quantify impact.

Analyze trends to pro-actively prevent incidents.

Focus on immediate restoration vs root cause.

Research and recommend alternative actions for incident resolution.

Create and maintain procedural documentation.

Participate in and drive continuous improvement efforts to reduce waste (eliminate, automate or streamline).

- Absorb knowledge and understand complex distributed systems - ability to share and impart this knowledge into your peer group.

Help build tools to improve visibility, pro-actively detect issues and restore system availability.

Help develop automation and self-healing with DevOps, Engineering and SRE partners.

Strong focus on collecting and inferring metrics.

Clear communication skills.

Additional responsibilities may include:

Actively provide data for and participate in root cause analysis.

Adhere to SRC onboarding process when accepting new systems into service.

Share knowledge globally between SRC teams.

Analyze systems and make recommendations to prevent possible incidents.

Strive for continuous improvement and make recommendations based on SRC process.

Other duties and responsibilities as assigned.

Qualifications:

2+ years in an infrastructure, systems, engineering or development environment delivering operational excellence to highly complex distributed systems.

Bachelor’s Degree in Computer Science or a related field, or relevant work experience.

Strong incident management skills with relevant exposure in an enterprise organization.

Experience and exposure working is a 24/7 operations support environment.

Methodical and systematic problem solving approach.

Networking knowledge and understanding of network concepts, such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, OSI layers, and load balancing).

Programming experience in one or more of the following languages: Go, Java, Python, Ruby, Shell

Experience administering Unix/Linux in a production environment.

Experience working with enterprise monitoring/tooling solutions like Grafana, Kibana, Splunk, Graphite, Nagios, New Relic, Dynatrace.

Experience with cloud technologies such as AWS, AZURE OpenStack.

Knowledge in docker and Kubernetes.

Loading ...

Date limite: 20-06-2024

Cliquez pour postuler pour un candidat gratuit

Postuler

Loading ...

MÊMES EMPLOIS

Telecaller

JPG Academy

⏰ 11-06-2024

🌏 Madurai, Tamil Nadu
Sr Associate Engineer Test/Validation

Caterpillar Inc.

⏰ 21-06-2024

🌏 Hosur, Tamil Nadu
PGT Accountancy Teacher

Newgen HR services

⏰ 11-06-2024

🌏 Salem, Tamil Nadu
Assistant Manager - Outsourcing Monitoring & Contract Management

BNP Paribas

⏰ 25-06-2024

🌏 Chennai, Tamil Nadu

Loading ...

Assistant Engineering Manager

Holiday Inn Express

⏰ 17-06-2024

🌏 Chennai, Tamil Nadu
IT Networking Firewall specialist

Empowering Assurance Systems Pvt Ltd

⏰ 27-06-2024

🌏 Chennai, Tamil Nadu
React Js Developer

Ford Motor Company

⏰ 19-06-2024

🌏 Chennai, Tamil Nadu
Lead Engineer- PT NVH CAE

Ford Motor Company

⏰ 11-06-2024

🌏 Chennai, Tamil Nadu

Loading ...

Blueprism Developer Junior

BNP Paribas

⏰ 23-06-2024

🌏 Chennai, Tamil Nadu
Development Engineer 2

Comcast Corporation

⏰ 23-06-2024

🌏 Chennai, Tamil Nadu

Emplois par catégorie ➕ Emplois par catégorie ➖

Emplois par emplacement ➕ Emplois par emplacement ➖

JobIndian.in

System Engineer - CCC

le contenu du travail

Date limite: 20-06-2024

MÊMES EMPLOIS

Walmart Global Tech India

RECRUTEMENT D’ENTREPRISE

Catégorie connexe

TROUVER DES EMPLOIS PAR CATÉGORIE

Emplois par emplacement

RECRUTEMENT DANS L’INDUSTRIE