Site Reliability Engineer
4 dni temu
At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections, where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters.
The PositionThe role requires the candidate to be available for on-call duty service, responding promptly to urgent issues and emergencies outside of regular working hours, ensuring that critical situations are addressed in a timely and effective manner
Who We Are
At Roche, we are passionate about transforming patients' lives, and we are bold in both decision and action - we believe that good business means a better world. That is why we come to work every single day. We commit ourselves to scientific rigor, unassailable ethics, and access to medical innovations for all. We do this today to build a better tomorrow.
Roche is strongly committed to a diverse and inclusive workplace. We strive to build teams that represent a range of backgrounds, perspectives, and skills. Embracing diversity enables us to create a great place to work and to innovate for patients.
Roche is building a global site reliability engineering (SRE) team that will support commercial and internal solutions. This team will have the mindset of building and creating engineering solutions to solve a broad spectrum of problems.
Step into the Future of IT Infrastructure with Roche
As a seasoned Site Reliability Engineer (SRE) at Roche, you'll leverage your deep software engineering expertise to propel our IT infrastructure to new heights of robustness, scalability, and reliability. This isn't just a role—it's an invitation to shape the backbone of critical infrastructures and drive our technological innovations forward.
Your Mission
Design and maintain cutting-edge tools, scripts, and frameworks that automate repetitive tasks, streamline software deployment, and manage expansive systems with unparalleled efficiency.
Partner closely with forward-thinking development teams to architect and implement high-performance solutions that elevate system efficiency, optimize resource utilization, and enhance deployment processes for superior uptime and user satisfaction.
Your Impact
Lead the charge in incident management and response. Detect system anomalies, troubleshoot swiftly, and conduct thorough root cause analyses to prevent recurring issues.
Champion continuous improvement by refining monitoring and alerting mechanisms, conducting insightful post-incident reviews, and embedding best practices in software lifecycle management. Your strategic foresight and meticulous planning will ensure our systems are not only reliable but also superlatively performant.
By joining our elite team, you will play a pivotal role in delivering seamless experiences to our end-users, exceeding business and customer demands, and solidifying Roche's reputation as a leader in IT innovation.
Your Core Responsibilities
Reliability Mastery: Proactively monitor and maintain system reliability using advanced tools like DataDog, VictorOps, ELK, Grafana, and Prometheus. Become a key player in ensuring system stability and performance.
Uptime Guardian: Ensure optimal uptime and performance by swiftly identifying issues and responding to alerts with precision.
Technical Troubleshooter: Basic understanding of Architecture and designs to deep dive into complex technical issues, troubleshoot, investigate, and resolve them. Collaborate seamlessly with engineering teams to enable timely and effective resolutions.
Service Excellence: Maintain and consistently achieve defined SLAs, SLIs, and SLOs, ensuring service levels are consistently met or exceeded.
Automation Innovator: Develop and deploy automation scripts (using Python or other scripting languages) to streamline operations, enhance system efficiencies, and reduce manual tasks.
Cloud Steward: Manage and maintain robust infrastructure across AWS and Azure environments, implementing best practices to ensure peak performance, reliability of cloud-based applications. Drive cost optimization through best practice implementation and continuous vigilance.
Cross-functional Collaborator: Work closely with engineering, DevOps, security and operations teams to drive continuous improvement and foster a culture of reliability and inclusion.
Incident Responder: Handle requests and incidents through JIRA and ServiceNow, documenting troubleshooting procedures, solutions, and lessons learned to fuel ongoing improvements.
Flexible Scheduling: Work on-call outside of normal working hours and weekends as scheduled to ensure continuous support.
Team Builder: Actively contribute to the growth and development of the SRE team's capabilities, nurturing a stronger, more inclusive, and resilient team.
Who You Are:
Educational Background: Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent professional experience. An MBA or PhD is a plus, but not required.
Certifications: Relevant industry certifications (AWS/Azure) to showcase your expertise.
Experience: Approximately 5 years of experience in site reliability engineering, IT operations, DevOps, or related fields, or equivalent skills and experience.
Cloud Expertise: Solid experience with AWS and/or Azure, including setting up, monitoring, and maintaining cloud resources (incl. Kubernetes, EKS, AKS, GKE, etc knowledge). Also experience on basis understanding of tools related to Infrastructure as a code, such as Terraform
Tool Proficiency: Proficiency with monitoring and logging tools such as DataDog, Splunk-Oncall, ELK stack, Grafana, and Prometheus etc. Knowledge of Loki Mimir and Tempo is a plus.
Hands-On Skills: Hands-on experience with JIRA and ServiceNow for tracking incidents, requests, and documentation.
Scripting Knowledge: Proficiency in Python or similar scripting languages for automation purposes.
Incident Response: Understanding of SRE Core principles beside in-depth understanding of incident prioritization, escalation processes, and service level management (SLA/SLO/SLI).
Troubleshooting: Demonstrates proficient troubleshooting capabilities, especially in cloud and distributed system environments.
Communication and Teamwork: Excellent communication, teamwork, and documentation skills, with a proactive and self-motivated approach to improving system reliability and operational efficiencies.
Diversity and Inclusion: We value and encourage candidates from diverse backgrounds and experiences, believing that diverse perspectives drive innovation and success.
Language requirements: Excelling in both spoken and written English communication.
Why Join Us?
By joining our team, you will be part of a dynamic environment where your contributions will directly impact the resilience and reliability of our services. You will have opportunities for professional growth and the ability to collaborate with industry leaders. Let's drive the future of IT stability together, ensuring an exceptional experience for our customers.
Ready to make a difference? Apply now to be our next SRE Incident Manager and help us build a more reliable future
Who we areA healthier future drives us to innovate. Together, more than 100'000 employees across the globe are dedicated to advance science, ensuring everyone has access to healthcare today and for generations to come. Our efforts result in more than 26 million people treated with our medicines and over 30 billion tests conducted using our Diagnostics products. We empower each other to explore new possibilities, foster creativity, and keep our ambitions high, so we can deliver life-changing healthcare solutions that make a global impact.
Let's build a healthier future, together.
Roche is an Equal Opportunity Employer.
-
OpenShift Site Reliability Engineer
1 tydzień temu
Warszawa, Mazovia, Polska In4Ge Pełny etat 60 000 zł - 120 000 zł rocznieJako Site Reliability Engineer (SRE) będziesz odpowiedzialny/a za niezawodność i wydajność naszej platformy kontenerowej OpenShift. Będziesz współpracować z zespołami developerskimi i operacyjnymi, automatyzować procesy, monitorować systemy i reagować na incydenty, aby zapewnić ciągłość działania usług.
-
Senior Site Reliability Engineer
1 tydzień temu
Warszawa, Mazovia, Polska inhire Pełny etat 80 000 zł - 120 000 zł rocznieDla naszego klienta DataArt Poland poszukujemy osoby na stanowisko Senior Site Reliability Engineer (SRE) – AWS & GCP.ClientOur client is revolutionizing the retail direct store delivery model by addressing key challenges like communication gaps, out-of-stocks, invoicing errors, and price inconsistencies. Through innovative technology and strong...
-
Site Reliability Engineer
1 godzinę temu
Warszawa, Mazovia, Polska Ververica Pełny etatAbout VervericaVerverica, founded by the original creators of Apache Flink, empowers businesses to unlock the full potential of real-time data processing and analytics. Our platform provides cutting-edge stream processing and event-driven applications, enabling companies worldwide to build scalable and reliable data-driven solutions.Role OverviewAs a Site...
-
Site Reliability Engineer
1 godzinę temu
Warszawa, Mazovia, Polska Infinity Quest Pełny etatAs a Senior DevOps Engineer/SRE Consultant, you will be part of our captivating journey in delivering Swiss Re's smart and reliable orchestration solution, fully applying your cloud native, microservice-focused skills. Committed to delivering value and creative solutions together with our clients and partners in the enterprise IT space, together with our...
-
Site Reliability Engineer
1 tydzień temu
Warszawa, Mazovia, Polska Paymentology Pełny etat 60 000 USD - 140 000 USD roczniePaymentology is the first truly global issuer-processor, giving banks and fintechs the technology, team and experience to rapidly issue and process Mastercard, Visa and UnionPay cards across more than 60 countries, at scale. Our advanced, multi-cloud platform, offering both shared and dedicated processing instances, vast global presence and richer,...
-
Site Reliability Engineer
1 tydzień temu
Warszawa, Mazovia, Polska Evertz Pełny etat 45 000 zł - 115 684 zł rocznieWe're looking for highly motivated, passionate site reliability engineers to join our growing team. At , our teams are building services that are used by the biggest names in the exciting broadcast and media industry. Our services are hosted in AWS, with a Serverless First mindset.As part of this role you will work with our talented teams to help harden our...
-
Site Reliability Engineer
6 dni temu
Warszawa, Mazovia, Polska EOS Pełny etat 80 000 zł - 120 000 zł rocznieWHO WE ARE:EOS IT Solutions is a Global Technology and Logistics company, providing Collaboration and Business IT Support services to some of the world's largest industry leaders, delivering forward-thinking solutions based on multi-domain architecture. Customer satisfaction and commitment to superior quality of service are our top business priorities,...
-
Senior Site Reliability Engineer
1 tydzień temu
Warszawa, Mazovia, Polska N-iX Pełny etat 60 000 zł - 120 000 zł rocznieAbout Us:Our Client is defining the future of cybersecurity through platform that automatically prevents, detects, and responds to threats in real time. Singularity XDR ingests data and leverages our patented AI models to deliver autonomous protection. With the Client, organizations gain full transparency into everything happening across the network at...
-
Junior Site Reliability Engineer
1 tydzień temu
Warszawa, Mazovia, Polska Moon Active Pełny etat 40 000 zł - 80 000 zł rocznieMoon Active is a company driven by the mission to become a global leader in mobile gaming. Founded in 2011, our passion for creativity, cutting-edge technology, and delivering exceptional player experiences has resulted in games enjoyed by millions worldwide.We're looking for aSite Reliability Engineerto join our talented community of professionals in our...
-
Site Reliability Engineer
1 tydzień temu
Warszawa, Mazovia, Polska Nord Security Pełny etat 120 000 zł - 180 000 zł rocznieThe world's most advanced VPN, and a whole lot more.If you're a curious problem-solver who carves their own path, join the team behind Threat Protection Pro, the NordLynx protocol, and the fastest VPN on the planet—tools that put privacy, security, and control back in people's hands.Your impact?Helping millions take back control of their online security,...