Principal Site Reliability Engineer
Company: Akamai Technologies
Location: Elkins Park
Posted on: December 3, 2025
|
|
|
Job Description:
Our team designs, develops, and manages applications and
infrastructure that support Akamais Compute products and services.
We do this while maintaining Akamais mission at the forefront of
what we do: make life better for billions of people, billions of
times a day. Partner with the best As a Principal Site Reliability
Engineer in the Virtualization & Host Platforms (VHP) team, you
will be at the forefront of Akamai Cloud and Delivery fleet
platform hardware and software host technologies. Our team is
responsible for all physical and virtual Linux platform software,
working closely with hardware teams to define and enable new server
builds, and investigating Linux performance issues company-wide. As
a Principal Site Reliability Engineer, you will be responsible for:
• Architecting, developing, testing, and distributing changes to
software, services, and tools the VHP team is responsible for. •
Designing and implementing enhancements to VHP observability
infrastructure in order to identify and correct problems before
they impact our customers • Developing subject matter expertise in
VHP components and mentoring the team. • Identifying and
implementing automation best practices for existing products and
processes • Collaborating with our support, operations and
engineering teams to investigate and troubleshoot complex problems
• Participating in on-call rotations, guiding restoration and
repair of service-impacting issues Do what you love To be
successful in this role you will: • Have 12 years of relevant
experience and a Bachelors degree in Computer Science or its
equivalent • Possess expertise in Linux internals, deep
understanding of hardware and best practices enabling HW features
in Linux. • Possess advanced level experience with the Linux
kernel, OS, and optimization of their configurations for KVM/QEMU
virtualization. • Possess expert level experience with designing,
developing, and deploying software and infrastructure at scale •
Have expertise in a DevOps, Development, or SysAdmin role, working
with large scale distributed systems • Have experience with tools
like SaltStack and Ansible for managing infrastructure at scale •
Have excellent communication and interpersonal skills FlexBase,
Akamais Global Flexible Working Program, is based on the principles
that are helping us create the best workplace in the world. When
our colleagues said that flexible working was important to them, we
listened. We also know flexible working is important to many of the
incredible people considering joining Akamai. FlexBase, gives 95%
of employees the choice to work from their home, their office, or
both (in the country advertised). This permanent workplace
flexibility program is consistent and fair globally, to help us
find incredible talent, virtually anywhere. We are happy to discuss
working options for this role and encourage you to speak with your
recruiter in more detail when you apply. We power and protect life
online, by solving the toughest challenges, together. At Akamai,
were curious, innovative, collaborative and tenacious. We celebrate
diversity of thought and we hold an unwavering belief that we can
make a meaningful difference. Our teams use their global
perspectives to put customers at the forefront of everything they
do, so if you are people-centric, youll thrive here. Working for
you At Akamai, we will provide you with opportunities to grow,
flourish, and achieve great things. Our benefit options are
designed to meet your individual needs for today and in the future.
We provide benefits surrounding all aspects of your life: • Your
health • Your finances • Your family • Your time at work • Your
time pursuing other endeavors
Keywords: Akamai Technologies, Philadelphia , Principal Site Reliability Engineer, IT / Software / Systems , Elkins Park, Pennsylvania