Principal Site Reliability Engineer

3 tygodni temu


Warząchewka Polska Lumen Technologies Pełny etat

We are looking for a Principal SRE Engineer to work directly on our proprietary, in-house product in a fast-paced and dynamic environment Must-have requirements: 9 years commercial experience as SRE Engineer in full time position 1 year experience in Tech Lead position Computer Science (or similar) University Degree Accept working hours: 10:00 - 18:00 (in Central European Time) This is hands-on position. The Role We are looking for a Principal Site Reliability Engineer (SRE) / Platform Engineer / DevOps Engineer with deep expertise in Kubernetes to design, implement, and manage high-availability, scalable systems primarily on AWS EKS. In this role, you will leverage tools like Terraform, ArgoCD, and GitHub Actions to automate infrastructure and workflows while implementing progressive deployment practices (e.g., blue-green, canary, or feature flagging). This position requires someone who can troubleshoot complex systems, implement robust monitoring and guardrails for databases and applications, and maintain a focus on optimising performance, reliability, and cost-efficiency. Main Responsibilities Kubernetes Management & Troubleshooting: Design and manage Kubernetes clusters (AWS EKS) with a focus on networking, scalability, security, and reliability. Troubleshoot complex, cross-system issues involving Kubernetes, databases, networking, and cloud infrastructure. Implement and maintain guardrails to ensure consistent and secure operation of Kubernetes workloads. Infrastructure Design & Automation: Architect, build, and maintain highly available, fault-tolerant systems using AWS services. Use Terraform to define infrastructure as code, enabling scalable, repeatable, and secure deployments. Automate provisioning, configuration, and updates for cloud infrastructure with a focus on GitOps principles using ArgoCD and GitHub Actions. System Guardrails & Application Monitoring: Set up and enforce guardrails for databases, infrastructure, and applications, ensuring consistency and adherence to best practices. Implement robust application and infrastructure monitoring using tools like Prometheus, Grafana, and potentially Datadog. Ensure proactive alerting and predictive monitoring to detect issues before they impact users. Progressive Deployment & CI/CD: Design and implement deployment strategies like blue-green deployments, canary releases, and feature-flag-based rollouts. Develop and maintain CI/CD pipelines to streamline application delivery, testing, and deployment. Collaboration & Best Practices: Partner with development teams to embed reliability and security best practices into the application lifecycle. Drive a culture of operational excellence, ensuring teams build for reliability, scalability, and security from the ground up. Resilience & Continuous Improvement: Conduct post-incident reviews to identify root causes and prevent future incidents. Implement practices like chaos engineering to test and enhance system resilience. Networking & Security: Design and manage secure networking solutions, including AWS VPCs, Kubernetes networking, and firewalls. Ensure compliance with security best practices and industry standards. What We Look For in a Candidate Required Qualifications 9 years of related experience in software development, systems engineering, and/or networking Kubernetes Expertise: Deep hands-on experience managing Kubernetes clusters (AWS EKS or similar) with a focus on networking, scaling, and security. Strong troubleshooting skills across Kubernetes workloads, infrastructure, and networking. Infrastructure as Code & Automation: Expertise in Terraform for infrastructure as code. Proven experience with ArgoCD and GitHub Actions for GitOps workflows and CI/CD pipelines. Monitoring & Observability: Proficiency in Prometheus, Grafana, and incident management workflows. Experience implementing application-level monitoring and tracing to identify performance bottlenecks. Guardrails & System Security: Demonstrated ability to set up guardrails for databases, Kubernetes clusters, and applications to ensure reliable and secure operations. Cloud Expertise: Advanced knowledge of AWS services, including EKS, EC2, CloudWatch, Route53, Aurora, and S3. Familiarity with auto-scaling, load balancing, and cloud cost optimisation. Programming & Scripting Skills: Strong proficiency in Python, Go, or Bash for scripting and automation tasks. Systems Troubleshooting: Proven ability to troubleshoot complex, distributed systems across cloud infrastructure, databases, and networking. Preferred Qualifications Experience with other cloud platforms such as GCP or Azure. Familiarity with logging and observability tools like ELK, Loki, or Graylog. Exposure to chaos engineering and resilience testing. Knowledge of HashiCorp Vault, SOPS, and secrets management best practices. Expertise in database systems, including setup, scaling, and optimisation. Strong listening and communication skills. Strong coaching and mentorship capabilities.



  • Warząchewka Polska NextChallenge Pełny etat

    Our partner is seeking a Site Reliability Engineer to join their high-stakes iGaming platform, ensuring rock-solid uptime, blazing performance, and a secure infrastructure for thousands of real-time bets across casino games, sports, and payments. Role Overview: Help power a platform that handles thousands of bets per second, across casino tables, sports...


  • Warząchewka Polska ICEO - Venture Builder Pełny etat

    As a Senior Site Reliability Engineer/ DevOps you'll have a direct and influential role in shaping our organisation's reliability strategy and infrastructure. You'll proactively create robust solutions, implement best practices, and drive infrastructure excellence across all teams. Join us remotely; you can be located anywhere in Europe within the CET/CEST...


  • Warząchewka Polska Haleon Pełny etat

    Welcome to Haleon. We're a purpose-driven, world-class consumer company putting everyday health in the hands of millions. In just three years since our launch, we've grown, evolved and are now entering an exciting new chapter – one filled with bold ambitions and enormous opportunity.Our trusted portfolio of brands – including Sensodyne, Panadol, Advil,...

  • Principal Java Developer

    3 tygodni temu


    Warząchewka Polska DCG Pełny etat

    As a recruitment company, DCG understands that every business is powered by experienced professionals. Our management style and partnership approach enable us to meet your needs and provide continuous support. Due to our ongoing growth and the large number of recruitment projects we undertake for our partners, we are currently looking for: Principal Java...

  • Principal Golang Engineer

    3 tygodni temu


    Warząchewka Polska Lumen Technologies Pełny etat

    We are looking for a Principal GoLang Engineer to work directly on our proprietary, in-house product in a fast-paced and dynamic environment Must-have requirements: 9 years commercial experience as Software Engineer in full time position (if you have less then 9 years of experience please review...


  • Warząchewka Polska Superdevs Pełny etat

    Does building awesome, innovative products that add up to something meaningful sound like a dream come true? Come join us and make that vision a reality! About Superdevs connects top developers with fast-growing product-based companies. We currently have over 100 Software Engineers across Poland working for tech pioneers in industries that include...


  • Warząchewka Polska Solventum Pełny etat

    Thank you for your interest in joining Solventum. Solventum is a new healthcare company with a long legacy of solving big challenges that improve lives and help healthcare professionals perform at their best. At Solventum, people are at the heart of every innovation we pursue. Guided by empathy, insight, and clinical intelligence, we collaborate with the...

  • Senior BizOps Engineer

    2 tygodni temu


    Warząchewka Polska Mastercard Pełny etat

    Our Purpose Mastercard powers economies and empowers people in 200 countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships...


  • Warząchewka Polska SIX Pełny etat

    Du arbeitest gerne in einem dynamischen Team und agilen Umfeld und bringst Leidenschaft für performante Backend-Systeme und automatisierte Infrastruktur mit? Dann werde Senior Application Engineer bei Terravis und gestalte aktiv mit, wie wir intelligente, skalierbare Lösungen entwickeln. Als führende digitale Plattform setzt Terravis Massstäbe in der...


  • Warząchewka Polska Bluware Inc. Pełny etat

    About Bluware Bluware, a Computer Modelling Group (CMG) company, is at the forefront of revolutionizing the energy industry through its innovative software solutions in subsurface data management and interpretation. Our roots are in the software gaming industry during the pioneering days of 3D video games. Now our focus is on leveraging advanced cloud and AI...