PW – Sr. SRE B. – Job3730

Summary

We are looking for a seasoned Site Reliability Engineer (SRE) to join our team and support our strategy of driving products and technology to accelerate business growth. As an SRE, you will work alongside a team of problem solvers, helping to solve complex business issues from strategy to execution.

Responsibilities

Defining standard reliability and resilience for infrastructure and application components.
Proactively optimizing redundancies, monitoring practices, and alerting patterns.
Developing resilient and highly available distributed systems.
Building infrastructure as code tools for cloud environments.
Monitoring systems and services, providing incident response to triage and resolve system or client issues.
Managing the application ecosystem, improving platform infrastructure and applications with high reliability,resiliency, performance, and quality.
Creating documentation, knowledge articles, and runbooks.
Designing and implementing SRE patterns that adhere to our client’s security guidelines and policies.

Requirements

Bachelor’s degree in Computer Science or related field (or equivalent work experience).
At least 4 years of relevant working experience as a Site Reliability Engineer or similar role.
Advanced Kubernetes expertise – Strong skills in Kubernetes at scale using AKS, EKS, or GKE. Experience with Kubectl and Helm. Familiarity with tools like Lens or Rancher.
Observability: experience in setting up tools like Datadog & Splunk for actionable insights on microservice environments including synthetics, application performance monitoring, logging, and alerting (PagerDuty/OpsGenie integrations).
Good CI/CD expertise. Experience using Azure DevOps & GitHub Actions for continuous integration and continuous deployment processes.
SCM proficiency – Working with tools like GitHub for source code management, along with experience in branching strategies like GitFlow or trunk-based development.
Strong troubleshooting skills – Ability to dive deep into code-level analysis to provide development teams with a head start on resolving application issues. Effective contribution to root cause analysis exercises.
Good communication skills – Active listening, verbal and non-verbal communication, clarity, concision, confidence, open-mindedness, and respect.
Good documentation skills – Ability to effectively document automation and technical efforts for ease of adaptability of solutions.
Collaboration skills – Ability to work effectively with Scrum/Dev teams using a push/pull philosophy, managing expectations and contributing to the stability and improvement of the platform.

Nice to Have

Infrastructure as Code tools (Terraform, Pulumi). Preferably developed modules in the past rather than just using them.
Security practices including encryption at rest/in transit with tools like Azure Key vault, Hashicorp Vault, Google KMS.
Containerization experience deploying Java (Spring Boot) microservices in Docker environments.
Automation – Must be able to identify toil and opportunities to reduce that within the team.
Authentication/Authorization – Familiarity with Authn/Authz schemes like OpenID, OAuth 2.0, SAML.
Scripting and Programming – Experience with Python, Powershell, Java or Node.
Familiarity with event-driven/event sourcing patterns using platforms like Kafka, EventHub, RabbitMQ and patterns like CQRS.

Solicitar este puesto

Full Name *

Email *

Nationality *

Residency *

English (CEFR Level) *

Linkedin URL *

Phone Number *

Upload your resume *Maximum allowed file size is 50 MB. Allowed type(s): .pdf

By using this form, you agree to the storage and handling of your data by this website. *

Back to listings