Senior LLM-MLops Engineer @ Square One Resources

2 dni temu

Gdynia, Polska Square One Resources Pełny etat

As a Senior MLOps/LLMOps Engineer , you will be at the forefront of building and scaling our AI/ML infrastructure, bridging the gap between cutting‑edge large language models and production‑ready systems. You will play a pivotal role in designing, deploying, and operating the platforms that power our AI‑driven products, working at the intersection of DevOps, MLOps, and emerging LLM technologies. In this role, you'll architect robust, scalable infrastructure for deploying and monitoring large language models (LLMs) such as GPT and Claude‑family models in AWS Bedrock & AWS AI Foundry, while ensuring security, observability, and reliability across multi‑tenant ML workloads. You will collaborate closely with data scientists, ML engineers, platform teams, and product stakeholders to create seamless, self‑serve experiences that accelerate AI innovation across the organization. This is a hands‑on leadership role that blends strategic thinking with deep technical execution. You'll own the end‑to‑end ML platform lifecycle; from infrastructure provisioning and CI/CD automation to model deployment, monitoring, and cost optimization. As a senior technical leader, you'll champion best practices, mentor team members, and drive a culture of continuous improvement, experimentation, and operational excellence. 8+ years of experiencein DevOps, Platform Engineering, or Site Reliability Engineering, with at least 2+ years focused on MLOps/LLMOps Deep hands‑on expertise with AWS services, including Bedrock, S3, EC2, EKS, RDS/PostgreSQL, ECR, IAM, Lambda, Step Functions, and CloudWatch Production experience managing Kubernetes workloads in EKS, including GPU workloads, autoscaling, resource quotas, and multi‑tenant configurations Proficient in container orchestration(Docker, Kubernetes), secrets management, and implementing GitOps‑style deployments using Jenkins, ArgoCD, FluxCD, or similar tools Practical understanding of deploying and scaling LLMs(e. g. , GPT and Claude‑family models), including prompt engineering, latency/performance tradeoffs, and model evaluation Strong programming skills in Python(FastAPI, Django, Pydantic, boto3, Pandas, NumPy) with solid computer science fundamentals (performance, concurrency, data structures) Working knowledge of Machine Learning techniques and frameworks(e. g. , scikit‑learn, TensorFlow, PyTorch) Experience building and operating data pipelineswith principles of idempotency, retries, backfills, and reproducibility Expertise in Infrastructure as Code (IaC)using Terraform, CloudFormation, and Helm Proven track record designing and maintaining CI/CD pipelineswith GitLab CI, Jenkins, or similar tools Observability experiencewith Prometheus/Grafana, Splunk, Datadog, Loki/Promtail, OpenTelemetry, and Sentry, including implementing sensible alerting strategies Strong grasp of networking, security concepts, and Linux systems administration Excellent communication skillswith ability to collaborate across development, QA, operations, and product teams Self‑motivated, proactive, with a strong sense of ownership and a passion for removing friction and improving developer experience Nice to Have: Experience with distributed compute frameworkssuch as Dask, Spark, or Ray Familiarity with NVIDIA Triton, TorchServe, or other inference servers Experience with ML experiment tracking platformslike Weights & Biases, MLflow, or Kubeflow FinOps best practicesand cost attribution strategies for multi‑tenant ML infrastructure Exposure to multi‑region and multi‑cloud designs, including dataset replication strategies, compute placement, and latency optimization Experience with LakeFS, Apache Iceberg, or Delta Lakefor data versioning and lakehouse architectures Knowledge of data transformation toolssuch as DBT Experience with data pipeline orchestration toolslike Airflow or Prefect Familiarity with Snowflakeor other cloud data warehouses Understanding of responsible AI practices, model governance, and compliance frameworks The Role: As a Senior MLOps/LLMOps Engineer , you will be at the forefront of building and scaling our AI/ML infrastructure, bridging the gap between cutting‑edge large language models and production‑ready systems. You will play a pivotal role in designing, deploying, and operating the platforms that power our AI‑driven products, working at the intersection of DevOps, MLOps, and emerging LLM technologies. In this role, you'll architect robust, scalable infrastructure for deploying and monitoring large language models (LLMs) such as GPT and Claude‑family models in AWS Bedrock & AWS AI Foundry, while ensuring security, observability, and reliability across multi‑tenant ML workloads. You will collaborate closely with data scientists, ML engineers, platform teams, and product stakeholders to create seamless, self‑serve experiences that accelerate AI innovation across the organization. This is a hands‑on leadership role that blends strategic thinking with deep technical execution. You'll own the end‑to‑end ML platform lifecycle; from infrastructure provisioning and CI/CD automation to model deployment, monitoring, and cost optimization. As a senior technical leader, you'll champion best practices, mentor team members, and drive a culture of continuous improvement, experimentation, and operational excellence. Run and evolve our ML/LLM compute infrastructure on Kubernetes/EKS(CPU/GPU) for multi‑tenant workloads, ensuring portability across AWS/Azure AI Foundry regions with region‑aware scheduling, cross‑region data access, and artifact management, Engage with platform and infrastructure teams to provision and maintain access to cloud environments (AWS, Azure), ensuring seamless integration with existing systems, Setup and maintain deployment workflows for LLM‑powered applications, handling environment‑specific configurations across development, staging/UAT, and production, Build and operate GitOps‑native delivery pipelinesusing GitLab CI, Jenkins, ArgoCD, Helm, and FluxCD to enable fast, safe rollouts and automated rollbacks, Deploy, scale, and optimize large language models (GPT, Claude, and similar) with deep consideration for prompt engineering, latency/performance tradeoffs, and cost efficiency, Operate and maintain Argo Workflows as reliable, self‑serve orchestration platforms for data preparation, model training, evaluation, and large‑scale batch compute, Implement and evaluate models using AI Observability frameworks to track model performance, drift, and quality in production, Design and maintain robust CI/CD pipelineswith isolated development, staging, and production environments to support safe iteration, reproducibility, and full lifecycle observability, Implement Infrastructure as Code (IaC)using Terraform, CloudFormation, and Helm to automate provisioning, configuration, and scaling of cloud resources, Manage container orchestration, secrets management (e. g. , AWS Secrets Manager), and secure deployment practices across all environments, Set up and analyze comprehensive observability stacksusing Prometheus/Grafana and Splunk to monitor model health, infrastructure performance, and system reliability. Requirements: AWS, DevOps, MLOps, AWS S3, AWS EC2, Amazon EKS, Amazon RDS, PostgreSQL, IAM, AWS Lambda, CloudWatch, Kubernetes, GPU, Autoscaling, Docker, Jenkins, ArgoCD, Python, FastAPI, Django, pandas, NumPy, Machine Learning, scikit‑learn, TensorFlow, PyTorch, Data pipelines, Infrastructure as Code, Terraform, CloudFormation, Helm, GitLab CI, Prometheus, Grafana, Splunk, Datadog, Security, Linux Senior LLM-MLops Engineer @ Square One ResourcesGdyniaGdynia, Pomeranian Voivodeship, Polska The Role: As a Senior MLOps/LLMOps Engineer , you will be at the forefront of building and scaling our AI/ML infrastructure, bridging the gap between cutting‑edge large language models and production‑ready systems. You will play a pivotal role in designing, deploying, and operating the platforms that power our AI‑driven products, working at the intersection of DevOps, MLOps, and emerging LLM technologies. In this role, you'll architect robust, scalable infrastructure for deploying and monitoring large language models (LLMs) such as GPT and Claude‑family models in AWS Bedrock & AWS AI Foundry, while ensuring security, observability, and reliability across multi‑tenant ML workloads. You will collaborate closely with data scientists, ML engineers, platform teams, and product stakeholders to create seamless, self‑serve experiences that accelerate AI innovation across the organization. This is a hands‑on leadership role that blends strategic thinking with deep technical execution. You'll own the end‑to‑end ML platform lifecycle; from infrastructure provisioning and CI/CD automation to model deployment, monitoring, and cost optimization. As a senior technical leader, you'll champion best practices, mentor team members, and drive a culture of continuous improvement, experimentation, and operational excellence. Essential Experience: 8+ years of experiencein DevOps, Platform Engineering, or Site Reliability Engineering, with at least 2+ years focused on MLOps/LLMOps Deep hands‑on expertise with AWS services, including Bedrock, S3, EC2, EKS, RDS/PostgreSQL, ECR, IAM, Lambda, Step Functions, and CloudWatch Production experience managing Kubernetes workloads in EKS, including GPU workloads, autoscaling, resource quotas, and multi‑tenant configurations Proficient in container orchestration(Docker, Kubernetes), secrets management, and implementing GitOps‑style deployments using Jenkins, ArgoCD, FluxCD, or similar tools Practical understanding of deploying and scaling LLMs(e. g. , GPT and Claude‑family models), including prompt engineering, latency/performance tradeoffs, and model evaluation Strong programming skills in Python(FastAPI, Django, Pydantic, boto3, Pandas, NumPy) with solid computer science fundamentals (performance, concurrency, data structures) Working knowledge of Machine Learning techniques and frameworks(e. g. , scikit‑learn, TensorFlow, PyTorch) Experience building and operating data pipelineswith principles of idempotency, retries, backfills, and reproducibility Expertise in Infrastructure as Code (IaC)using Terraform, CloudFormation, and Helm Proven track record designing and maintaining CI/CD pipelineswith GitLab CI, Jenkins, or similar tools Observability experiencewith Prometheus/Grafana, Splunk, Datadog, Loki/Promtail, OpenTelemetry, and Sentry, including implementing sensible alerting strategies Strong grasp of networking, security concepts, and Linux systems administration Excellent communication skillswith ability to collaborate across development, QA, operations, and product teams Self‑motivated, proactive, with a strong sense of ownership and a passion for removing friction and improving developer experience Nice to Have: Experience with distributed compute frameworkssuch as Dask, Spark, or Ray Familiarity with NVIDIA Triton, TorchServe, or other inference servers Experience with ML experiment tracking platformslike Weights & Biases, MLflow, or Kubeflow FinOps best practicesand cost attribution strategies for multi‑tenant ML infrastructure Exposure to multi‑region and multi‑cloud designs, including dataset replication strategies, compute placement, and latency optimization Experience with LakeFS, Apache Iceberg, or Delta Lakefor data versioning and lakehouse architectures Knowledge of data transformation toolssuch as DBT Experience with data pipeline orchestration toolslike Airflow or Prefect Familiarity with Snowflakeor other cloud data warehouses Understanding of responsible AI practices, model governance, and compliance frameworks The Role: As a Senior MLOps/LLMOps Engineer , you will be at the forefront of building and scaling our AI/ML infrastructure, bridging the gap between cutting‑edge large language models and production‑ready systems. You will play a pivotal role in designing, deploying, and operating the platforms that power our AI‑driven products, working at the intersection of DevOps, MLOps, and emerging LLM technologies. In this role, you'll architect robust, scalable infrastructure for deploying and monitoring large language models (LLMs) such as GPT and Claude‑family models in AWS Bedrock & AWS AI Foundry, while ensuring security, observability, and reliability across multi‑tenant ML workloads. You will collaborate closely with data scientists, ML engineers, platform teams, and product stakeholders to create seamless, self‑serve experiences that accelerate AI innovation across the organization. This is a hands‑on leadership role that blends strategic thinking with deep technical execution. You'll own the end‑to‑end ML platform lifecycle; from infrastructure provisioning and CI/CD automation to model deployment, monitoring, and cost optimization. As a senior technical leader, you'll champion best practices, mentor team members, and drive a culture of continuous improvement, experimentation, and operational excellence. Run and evolve our ML/LLM compute infrastructure on Kubernetes/EKS(CPU/GPU) for multi‑tenant workloads, ensuring portability across AWS/Azure AI Foundry regions with region‑aware scheduling, cross‑region data access, and artifact management, Engage with platform and infrastructure teams to provision and maintain access to cloud environments (AWS, Azure), ensuring seamless integration with existing systems, Setup and maintain deployment workflows for LLM‑powered applications, handling environment‑specific configurations across development, staging/UAT, and production, Build and operate GitOps‑native delivery pipelinesusing GitLab CI, Jenkins, ArgoCD, Helm, and FluxCD to enable fast, safe rollouts and automated rollbacks, Deploy, scale, and optimize large language models (GPT, Claude, and similar) with deep consideration for prompt engineering, latency/performance tradeoffs, and cost efficiency, Operate and maintain Argo Workflows as reliable, self‑serve orchestration platforms for data preparation, model training, evaluation, and large‑scale batch compute, Implement and evaluate models using AI Observability frameworks to track model performance, drift, and quality in production, Design and maintain robust CI/CD pipelineswith isolated development, staging, and production environments to support safe iteration, reproducibility, and full lifecycle observability, Implement Infrastructure as Code (IaC)using Terraform, CloudFormation, and Helm to automate provisioning, configuration, and scaling of cloud resources, Manage container orchestration, secrets management (e. g. , AWS Secrets Manager), and secure deployment practices across all environments, Set up and analyze comprehensive observability stacksusing Prometheus/Grafana and Splunk to monitor model health, infrastructure performance, and system reliability. Requirements: AWS, DevOps, MLOps, AWS S3, AWS EC2, Amazon EKS, Amazon RDS, PostgreSQL, IAM, AWS Lambda, CloudWatch, Kubernetes, GPU, Autoscaling, Docker, Jenkins, ArgoCD, Python, FastAPI, Django, pandas, NumPy, Machine Learning, scikit‑learn, TensorFlow, PyTorch, Data pipelines, Infrastructure as Code, Terraform, CloudFormation, Helm, GitLab CI, Prometheus, Grafana, Splunk, Datadog, Security, Linux #J-18808-Ljbffr

Senior MLOps

2 dni temu

Gdynia, Polska Square One Resources Pełny etat

A leading tech firm is seeking a Senior MLOps/LLMOps Engineer to build and scale AI/ML infrastructure. This hands-on leadership role involves architecting robust platforms for deploying large language models, ensuring security and reliability across multi-tenant ML workloads. Ideal candidates have over 8 years of experience in DevOps and deep AWS expertise....
Senior Data Engineer ML

1 tydzień temu

Gdynia, Polska Lumicode Sp. z o.o. (Pentacomp Group) Pełny etat

Kim jesteśmy? Lumicode Sp. z o.o. należy do Grupy Pentacomp , która jest producentem rozwiązań informatycznych i dostawcą profesjonalnych usług IT dla dużych przedsiębiorstw i sektora publicznego. Jako Pentacomp tworzymy rozwiązania IT łączące innowacyjność z latami doświadczeń - a ich mamy całkiem sporo. Istniejemy na rynku prawie 30 lat i...
Senior Data Engineer ML Engineer

3 tygodni temu

Gdynia, Polska Lumicode Sp. z o.o. Pełny etat

Who We Are Lumicode Sp. z o.o. is part of the Pentacomp Group, a provider of IT solutions and professional IT services for large enterprises and the public sector. As Pentacomp, we create IT solutions that combine innovation with years of experience - and we have quite a lot of it. We've been in the market for nearly 30 years and have successfully delivered...
Senior Cloud

4 dni temu

Gdynia, Pomerania, Polska Nordea Pełny etat 100 000 zł - 160 000 zł rocznie

Job ID: 31721Welcome to Core Technology, where we pride ourselves on engineering solutions and platforms with a direct impact to Nordea´s 2030 strategy goals to modernize our data technology state and accelerate AI across the organization.We are looking for an Senior Cloud & AI Platform Engineer to join our Data-AI-Integration team. You will join a small,...
Senior Engineer

1 tydzień temu

Gdynia, Polska Devopsbay Pełny etat

We’re Devopsbay - MLOPS, DevOps and AI Specialists. We know how nodes works, how to make the cloud cheaper or adapt AI to boost any area that companies need (any many more). We support our clients with strong engineers on a project basis and are always on the lookout for stellar performers. Currently, we’re working with a client specialising in AI...
Senior ML Engineer Data Engineer

4 tygodni temu

Gdynia, Polska emagine Polska Pełny etat

Information about the project: Industry: banking Rate: up to 180 pln/h net VAT, B2B Type of contract: B2B Location: hybrid model (2x/week in the office) Warsaw/Łódź/Gdańsk/Gdynia Summary The primary objective of this role is to manage the complete lifecycle of machine learning models, ensuring their optimal performance, reliability, and integration in...
Senior QA Engineer

2 dni temu

Gdynia, Pomerania, Polska B2B S.A Pełny etat 60 000 zł - 120 000 zł rocznie

Poszukujemy doświadczonego Senior QA Engineer (non-functional testing), który dołączy do naszego zespołu. Na tym stanowisku będziesz odpowiedzialny za optymalizację wydajności i skalowalności naszego systemu KYC, zapewniając płynne i efektywne doświadczenie użytkownika.Senior QA Engineer (non-functional testing)Twój zakres...
Senior DevOps Engineer

1 tydzień temu

Gdynia, Polska emagine Polska Pełny etat

Project information: Banking Gdańsk/Gdynia - 2x per week in the office ASAP/to determinate B2B up to 165zł/h Long-term Introduction & Summary We are seeking a Senior DevOps Engineer to join our team, dedicated to enhancing our API Management platform. The ideal candidate will possess strong skills in Docker, Kubernetes, Terraform, AWS, Ansible , and...
Senior Testing Engineer

2 dni temu

Gdynia, Pomerania, Polska B2B S.A Pełny etat 70 000 zł - 120 000 zł rocznie

Poszukujemy doświadczonego Senior Non-Functional Testing Engineera, który dołączy do zespołu realizującego projekt w obszarze KYC (Know Your Customer). Osoba na tym stanowisku będzie odpowiedzialna za testy wydajnościowe oraz bezpieczeństwa aplikacji webowych, analizę metryk i współpracę z zespołami deweloperskimi i architektonicznymi w celu...
Senior DevOps Engineer

3 tygodni temu

Gdynia, Trójmiasto, Polska emagine Polska Pełny etat 27 zł - 720 zł

Project information:BankingGdańsk/Gdynia - 2x per week in the officeASAP/to determinateB2B up to 165zł/hLong-termIntroduction & SummaryWe are seeking a Senior DevOps Engineer to join our team, dedicated to enhancing our API Management platform. The ideal candidate will possess strong skills in Docker, Kubernetes, Terraform, AWS, Ansible, and monitoring...

Ameryka

Europa

Azja / Oceania

Afryka

Senior LLM-MLops Engineer @ Square One Resources