Lead Site Reliability Engineer Job in Kenya

Description:

As Systems Site Reliability Engineer Manager, you will lead and support a team of SRE engineers who are working to identify challenges, analyze causes and apply corrective action to ensure that our systems are reliable, scalable and performant as per agreed service level objectives.

Reporting to the Principal SRE (Site Reliability Engineering) Lead, you will be a part of the team responsible for helping to support 24×7 uptime and availability of production mission-critical services within the Bank. You will help to create more consistent, automated environments across all applications or services, proactively test and tune all aspects of the platforms, streamline CI/CD processes, monitor, and respond to system notifications and alerts and continually work to optimize and improve the performance, security, and reliability of our systems.

Responsibilities:


Lead SRE (Site Reliability Engineering) initiatives in your areas of focus

Mentor and support the members of the team to achieve high levels of performance

Lead the identification and establishment of the service level indicators to support SLOs (Service Level Objectives)

Take ownership of the availability, stability, resilience, and system / service health

Provide technical leadership in initiatives to improve availability, stability, resilience of our services

Take leadership in incident response activities to restore services

Collaborate with Dev teams to improve services through rigorous testing and release procedures

Participate in architecture design, platform management, and capacity planning exercises.

Create sustainable systems and services through automation and uplifts

Required Skills and Qualifications:

Bachelor’s degree in computer science or equivalent

5+ years’ experience as a SRE/DevOps Lead

Experience in managing SRE/DevOps/Software engineers

Strong oral and written communication skills

Attention to detail and strong troubleshooting skills

Demonstrable experience in Containerization-Docker and orchestration (Kubernetes)

Demonstrable experience in CI/CD tools such as Azure DevOps, circle CI, Jenkins etc.

Good understanding of Infrastructure as Code (Terraform, Cloud Formation, Ansible)

Familiarity with Linux and UNIX systems and command line system administration such as Bash, VIM, SSH (secure shell).

Basic scripting skills (preferably Golang, bash, shell, etc.,)

Experience in monitoring and analyzing infrastructure performance using standard performance monitoring tools – Dynatrace, Azure Application insights, Prometheus, SolarWinds

Good understanding of networking concepts e.g., Network routing, Load balancing, and Networking protocols, a base knowledge of TCP/IP, with an understanding of HTTP and DNS

Experience in programming (structured and OOP) with one or more high level languages, such as Python, Java, .NET, and JavaScript

Knowledge and proven hands-on experience in large-scale databases and distributed technologies, such as Kafka and Redis will be an added advantage

How To Apply

Click Here To Apply