Site Reliability Engineer
1 dzień temu
Location: Poland only, fully remoteJob Type: B2B, full time OverviewHard Rock Digital is a team focused on becoming the best online sportsbook, casino, and social gaming company in the world. We care about each customer's interaction, experience, behaviour, and insight and strive to ensure we’re always acting authentically. Rooted in the kindred spirits of the Seminole Tribe of Florida, the new Hard Rock Digital taps a brand known all over the world as the leader in gaming, entertainment, and hospitality. We’re taking that foundation of success and bringing it to the digital space.What’s the position?We are looking for a skilled Site Reliability Engineer (SRE) to maintain and improve the reliability, scalability, and performance of our Java-based application. You will be responsible for managing and monitoring the applications and infrastructure, using the Grafana stack (Grafana, Loki, Prometheus) to ensure a high level of observability, and implementing robust monitoring, alerting, and logging solutions. Key Responsibilities:Application Reliability & Performance:Ensure the availability, reliability, and performance of a high-traffic Java-based application in a distributed environment.Troubleshoot and resolve complex issues in production and non-production environments.Participate in both pre- and post-deployment performance testing and monitoring efforts to improve application performance.Optimize Java application performance, ensuring efficient resource utilization and scaling.Monitoring & Observability:Deploy and manage the Grafana stack (Grafana, Prometheus, Loki) to provide real-time monitoring, logging, and alerting.Implement and refine observability strategies to enhance application and infrastructure visibility.Create and maintain dashboards, alerts, and logs for comprehensive monitoring of system health and performance.Incident Management & Root Cause Analysis:Support the operations team’s incident response efforts, participate in post-mortems, and identify root causes of issues to prevent recurrence.Document and share lessons learned from incidents, contributing to a culture of continuous improvement.Collaboration & Cross-functional Support:Work closely with developers, architects, and other engineers to design and implement solutions that improve application reliability.Collaborate closely with DevOps and NOC teams to support the application platform.Communicate SRE practices and principles to technical and non-technical stakeholders.Provide feedback and insights on application performance, potential improvements, and observability metrics.RequirementsWhat are we looking for?The ideal candidate will have:Degree in computer science or a related field, or equivalent work experience2-3 years in SRE, DevOps, or similar Infrastructure rolesExperience managing large-scale, high-availability production systemsTrack record of incident response and post-mortem processesExperience with capacity planning and performance optimization1+ years hands-on experience managing production Kubernetes clustersDeep understanding of k8s architecture, networking, storage, and securityExperience with cluster scaling (Karpenter), upgrades, and multi-cluster managementProficiency with kubectl, Helm, and Kubernetes operatorsContainer orchestration and troubleshooting knowledgeExpertise with the Grafana stack for dashboards, alerting, and visualizationHands-on experience with Grafana Alloy for telemetry data collectionProficiency in PromQLExperience with Loki for log aggregation and analysisExperience building comprehensive monitoring and alerting strategiesHands-on experience managing Java-based applications in large-scale, distributed environments, with a focus on JVM tuning and application optimization.Cloud Platform expertise (AWS, GCP, or Azure)Familiarity with infrastructure as code (IAC) tools like Terraform/Terragrunt or Ansible.ArgoCD proficiency for GitOps workflows and continuous deploymentScripting abilities in Bash, Python, or GoExperience with CI/CD piplelines and automation toolsConfiguration Management and deployment automationStrong troubleshooting skills, with a proactive approach to diagnosing and resolving performance bottlenecks.Proven experience in on-call rotations, incident response, and root cause analysis.Strong communication skills (both written and verbal), positive attitude, and ability to receive constructive feedback.
-
Site Reliability Engineer
1 dzień temu
Warszawa, mazowieckie, Polska SIX Pełny etatDo you want to work in a highly dynamic environment? We seek a Site Reliability Engineer (SRE) to join our SRE Department in Warsaw, responsible for driving automation and reliability, owning CI/CD pipelines, and managing incident response. Be part of a team that values collaboration, trust, innovation, and continuous improvement.What You Will DoDrive...
-
SRE (Site Reliability Engineer)
2 tygodni temu
Warszawa, mazowieckie, mazowieckie, Polska Connectis Pełny etatSRE (Site Reliability Engineer)Miejsce pracy: WarszawaTechnologie, których używamyWymaganePrometheusGrafanaMicrosoft AzureCI/CDJavaSpring FrameworkSQLNoSQLO projekcieWspólnie z naszym Partnerem, globalną siecią stacji paliw i sklepów spożywczych, poszukujemy osoby na stanowisko SRE (Site Reliability Engineer) odpowiedzialnej za rozówj platformy Next...
-
Senior Site Reliability Engineer
1 dzień temu
Warszawa, mazowieckie, Polska TQLO SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ Pełny etat 30 zł - 240 złOur Client is an international organization developing a modern, highly available digital platform used by millions of users.The project focuses on building and maintaining scalable cloud infrastructure, automating processes, improving reliability, and implementing Site Reliability Engineering (SRE) best practices.We are looking for an experienced Senior...
-
Site Reliability Engineer
1 dzień temu
Warszawa, mazowieckie, mazowieckie, Polska Sii Sp. z o.o. Pełny etatSite Reliability Engineer (f/m/x)Miejsce pracy: WarszawaTechnologies we useExpectedMicroservicesMicrosoft AzureAbout the projectWe are looking for Site Reliability Engineer to work on a large-scale, next-generation retail platform used globally by millions of customers every day. This is a modern, cloud-based ecosystem designed to operate at real enterprise...
-
Senior Site Reliability Engineer
1 dzień temu
Warszawa, mazowieckie, Polska DCG Pełny etat 26 złAs a recruitment company, DCG understands that every business is powered by experienced professionals. Our management style and partnership approach enable us to meet your needs and provide continuous support. Due to our ongoing growth and the large number of recruitment projects we undertake for our partners, we are currently looking for:Senior Site...
-
Site Reliability Engineer SRE cyber security
3 tygodni temu
Warszawa, mazowieckie, Polska Grid Dynamics Poland Pełny etatWe are seeking a Site Reliability Engineer with a strong and specialized skill set. The ideal candidate works with complex automation, drives effective incident management, leads system optimization efforts, and helps ensure the reliability, scalability, and performance of our services.Responsibilities:Handle incidents, running post-incident reviews, and...
-
Site Reliability Engineer
1 dzień temu
Warszawa, mazowieckie, Polska Poland Pełny etat 29 zł - 400 złW ITLT pomagamy naszym zaprzyjaźnionym firmom przekształcać ambitne pomysły w cyfrową rzeczywistość.Z nastawieniem na wyzwania, ciekawość technologii i zwinność - współtworzymy wyjątkowe rozwiązania IT. Aktualnie poszukujemy osób na stanowisko: Site Reliability Engineer - SRE Konkrety:Stawka: 175 - 190PLN/h na FVMiejsce pracy/praca zdalna:...
-
Software Engineer II
1 dzień temu
Warszawa, mazowieckie, mazowieckie, Polska Google Pełny etatSoftware Engineer II - Site Reliability EngineeringMiejsce pracy: WarszawaTechnologies we useOperating systemLinuxAbout the projectSite Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and...
-
SRE Site Reliability Engineer
1 dzień temu
Warszawa, mazowieckie, Polska Connectis Pełny etat 28 złWspólnie z naszym Partnerem, globalną siecią stacji paliw i sklepów spożywczych, poszukujemy osoby na stanowisko SRE (Site Reliability Engineer) odpowiedzialnej za rozwój platformy Next Generation Retail.Nasz Partner skupia się na rozwijaniu aplikacji internetowych i mobilnych związanych z szeroko pojętą elektryfikacją, obejmującą samochody...
-
Staff Site Reliability Engineer
2 tygodni temu
Warszawa, mazowieckie, mazowieckie, Polska Visa Technology Europe sp. z o.o. Pełny etatStaff Site Reliability Engineer - (Hadoop)Miejsce pracy: WarszawaTechnologies we useExpectedKubernetesAWSOperating systemLinuxYour responsibilitiesHadoop/Big-Data:•Sound knowledge on managing large scale Hadoop platforms including monitoring the platform, debugging issues, and tuning the performance of the cluster.•In-depth knowledge of the Hadoop...