Director of Site Reliability Engineering Job in Kenya

Director of Site Reliability Engineering

Summary

We are strengthening the team and looking for a Director of SRE to lead our staff and ensure teams achieve our goals. Wikimedia Foundation’s Site Reliability Engineering team is responsible for ensuring our global top-10 web site and other public facing services are healthy, and developing its infrastructure, platform, and services further in the enablement of Wikimedia Foundation’s mission. The SRE team comprises over 45 creative and talented staff members who are globally distributed, organized into seven teams each with their own scope and focus area.

Responsibilities:

Your priority: Lead multiple SRE teams in keeping Wikimedia’s sites and services (including Wikipedia) running responsively, reliably, and securely, including protection against outages, data loss or breaches, and accommodation and implementation of Wikimedia’s Movement Strategy (including “Infrastructure for Open”).

Your second priority: Partner with engineering teams at Wikimedia to set direction and build platforms enabling transformative changes to Wikimedia’s user experience while ensuring appropriate operational review and support along the way.

Your foundation: An amazing Site Reliability Engineering team that’s taken us to more than half a billion users a month with passion, ingenuity, solid engineering practices, and duct tape. Nurturing, growing, trusting, and developing this team and its leaders is your path to success in this role.

Your values: You care about free and open information, and are committed to finding solutions to engineering problems in line with our guiding principles. You share our values and work by them.

Qualifications:

8+ years experience in site reliability engineering, technical operations, or infrastructure engineering roles

4+ years experience managing infrastructure teams at high-traffic websites or online services at scale

Track record of managing, inspiring, and mentoring multiple managers and engineers, and aligning them across the organization and in the community

Experience in managing large-scale projects with technical deep dives into code, networking, and operating systems

Experience developing and tracking department and project budgets

Experience in globally distributed, multi-site high-traffic environments, preferably with both on-premise bare-metal and cloud-based infrastructure

Familiarity with open-source development and community practices. Experience adopting/integrating open-source solutions. Track record of upstream contributions (whether personal or through a team) is a huge plus

Familiarity with engineering team practices and experience interfacing SRE with other design, product, and engineering teams tasked with the continuous delivery of functionality

Familiarity with large website application architectures, including caching layers, storage scaling concepts, network infrastructure, monitoring systems, etc.

Experience with highly geographically distributed teams and follow-the-sun operations is a major plus. Personal cross-cultural experience (having lived, or worked internationally) helps as well

Pluses:

A track record of modeling and shaping the best community, open source, and development practices

Experience in negotiation & RFPs for data center service contracts, equipment purchases, peering agreements, etc.

How To Apply

Click Here To Apply