Director of Site Reliability Engineering
Summary
We are strengthening the
team and looking for a Director of SRE to lead our staff and ensure teams
achieve our goals. Wikimedia Foundation’s Site Reliability Engineering team is
responsible for ensuring our global top-10 web site and other public facing services
are healthy, and developing its infrastructure, platform, and services further
in the enablement of Wikimedia Foundation’s mission. The SRE team comprises
over 45 creative and talented staff members who are globally distributed,
organized into seven teams each with their own scope and focus area.
Responsibilities:
Your
priority: Lead multiple SRE teams in keeping Wikimedia’s sites and
services (including Wikipedia) running responsively, reliably, and securely,
including protection against outages, data loss or breaches, and accommodation
and implementation of Wikimedia’s Movement Strategy (including “Infrastructure
for Open”).
Your
second priority: Partner with engineering teams at Wikimedia to set direction
and build platforms enabling transformative changes to Wikimedia’s user
experience while ensuring appropriate operational review and support along the
way.
Your
foundation: An amazing Site Reliability Engineering team that’s taken us
to more than half a billion users a month with passion, ingenuity, solid
engineering practices, and duct tape. Nurturing, growing, trusting, and
developing this team and its leaders is your path to success in this role.
Your
values: You care about free and open information, and are committed to
finding solutions to engineering problems in line with our guiding
principles. You share our values and work by them.
Qualifications:
8+
years experience in site reliability engineering, technical operations, or
infrastructure engineering roles
4+
years experience managing infrastructure teams at high-traffic websites or
online services at scale
Track
record of managing, inspiring, and mentoring multiple managers and engineers,
and aligning them across the organization and in the community
Experience
in managing large-scale projects with technical deep dives into code,
networking, and operating systems
Experience
developing and tracking department and project budgets
Experience
in globally distributed, multi-site high-traffic environments, preferably with
both on-premise bare-metal and cloud-based infrastructure
Familiarity
with open-source development and community practices. Experience
adopting/integrating open-source solutions. Track record of upstream
contributions (whether personal or through a team) is a huge plus
Familiarity
with engineering team practices and experience interfacing SRE with other
design, product, and engineering teams tasked with the continuous delivery of
functionality
Familiarity
with large website application architectures, including caching layers, storage
scaling concepts, network infrastructure, monitoring systems, etc.
Experience
with highly geographically distributed teams and follow-the-sun operations is a
major plus. Personal cross-cultural experience (having lived, or worked
internationally) helps as well
Pluses:
A
track record of modeling and shaping the best community, open source, and
development practices
Experience
in negotiation & RFPs for data center service contracts, equipment
purchases, peering agreements, etc.
How To Apply