Freelance Agent Evaluation Analyst
1 dzień temu
Company Intro
At Toloka AI we create data that powers leading GenAI models and innovations. We work with frontier labs, big tech, renowned AI startups, enterprises and non-profit research organizations worldwide. We use a combination of Experts + Crowd + Tech Platform to teach AI models to reason and evaluate their efficacy and safety. We have experts in more than 50 different domains—from doctors and lawyers to physicists and engineers—and boast one of the most diverse global crowds, representing over 100 countries and speaking 40+ languages. We are a well-funded startup with an enviable portfolio of clients including Anthropic, Amazon, Microsoft, poolside, Recraft, and Shopify.
Recently, we secured strategic investment led by Bezos Expeditions with participation from Mikhail Parakhin, CTO of Shopify and board advisor to leading GenAI companies, who now serves as our Chairman of the Board. Our remote-first team is globally distributed around the world: USA, UK, the Netherlands, Israel, Czech Republic, Serbia, and more. We are headquartered in Amsterdam.
About the Role
We are looking for an Freelance Agent Evaluation Analyst to take ownership of quality, structure, and insight across the project. This role goes far beyond task-checking - it's about critical thinking, systems-level analysis, and ensuring clarity, reliability, and consistency at scale.
You'll work as both a hands-on evaluator and an analyst, collaborating with domain experts, delivery managers, and engineers. Beyond reviewing outputs, you'll be expected to understand the "why" behind the work, identify logical gaps or inconsistencies, and propose meaningful improvements.
This is a flexible, impact-driven role where you'll have space to grow, contribute ideas, and help shape how evaluation and quality are scaled across the project.
This role is especially well-suited for:
- Analysts, researchers, or consultants with strong structuring and reasoning skills
- Junior product managers or strategists curious about AI and evaluation work
- Smart problem-solvers (students or early-career professionals) who enjoy digging into logic, systems, and edge cases
You do not need a coding background. What matters most is curiosity, intellectual rigor, and the ability to evaluate complex setups with precision.
What you'll be doing
- Fully own the QA pipeline for agent evaluation tasks;
- Review and validate tasks and golden paths created by scenario writers and experts;
- Spot logical inconsistencies, vague requirements, hidden risks, and unrealistic assumptions;
- Provide structured feedback and ensure quality alignment across contributors;
- Train, onboard, and mentor new QA team members;
- Collaborate with domain experts, delivery managers, and engineers to improve test clarity and coverage;
- Maintain and improve QA checklists, SOPs, and review guidelines;
- Contribute to test planning, prioritization, and quality benchmarks;
- Take initiative to suggest new approaches, tools, and processes that help scale validation and analysis.
What you should know / be able to do
- Strong analytical and critical thinking skills;
- Attention to detail and reliability - your work can be trusted without double-checking;
- Experience in manual QA, scenario validation, or similar analytical work;
- Comfortable working with structured formats (JSON/YAML);
- Clear written communication and documentation skills;
- Ability to give constructive feedback and coach others;
- Capable of working with a wide range of stakeholders: from engineers to directors/VPs.
Nice to have
- Background in scenario-based testing, test design, or annotation workflows;
- Experience with AI/LLM evaluation, prompt validation, or agent behavior testing;
- Some technical independence (e.g., Python skills);
- Familiarity with MCP / tool-based task execution;
- Experience working in cross-functional teams across product, delivery, and engineering.
Who you are
- Detail-obsessed but also able to see the bigger picture;
- Proactive, independent, and take true ownership of your work;
- Strong communicator who can turn complex findings into actionable insights;
- Flexible and motivated to contribute across a variety of tasks and projects;
- Believe quality is not just checking work, but making the whole product better.
What we can offer
- Freelance full-time contract (B2B)
- Flexible payment based on the results of work;
- Flexibility: we offer freelance collaboration. You will also design with your manager a workday that works best for you;
- Hourly rate EUR per hour
- Friendly community.
[Important Notice] Scam Alert Regarding Fake Job Postings
It has come to our attention that an individual or group is fraudulently impersonating Toloka to post fake jobs and solicit personal information from applicants.Please be aware:
- Official Communication: Our recruiting team will only contact you from an official "" email address. We will NEVER use Gmail, Yahoo, Tolokainc, , or other personal or seemingly business email accounts.
- Our Process: We will never ask for your bank account details, credit card number, or any fees as part of the application or interview process.
- Official Listings: All legitimate job openings are posted on our official careers page:
Thank you for your vigilance
-
AI Agent Evaluation Analyst
7 dni temu
Warszawa, Mazovia, Polska Mindrift Pełny etat 800 zł - 3 200 zł rocznieThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of...
-
Evaluation Scenario Writer
1 tydzień temu
Warszawa, Mazovia, Polska Mindrift Pełny etat 6 000 zł - 30 000 zł rocznieThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we...
-
Freelance Cybersecurity Analyst
1 tydzień temu
Warszawa, Mazovia, Polska Mindrift Pełny etat 40 000 zł - 80 000 zł rocznieThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.What...
-
Freelance Software Developer
1 tydzień temu
Warszawa, Mazovia, Polska Mindrift Pełny etat 30 000 zł - 60 000 zł rocznieThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.What...
-
Freelance Software Developer
1 tydzień temu
Warszawa, Mazovia, Polska Mindrift Pełny etat 35 000 zł - 54 100 zł rocznieThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.What...
-
Warszawa, Mazovia, Polska TransPerfect Pełny etat 50 000 € - 100 000 € rocznieJob description TransPerfect Games is an award-winning multilingual video games services company dedicated to world-class translation, localization, player support, games testing, cybersecurity, tool development, art design, and community management services. Our successful community management and player support departments support countless exciting games...
-
Freelance Technical Solutions Engineer
7 dni temu
Warszawa, Mazovia, Polska Toloka Pełny etat 40 000 zł - 80 000 zł rocznieCompany IntroAt Toloka AI we create data that powers leading GenAI models and innovations. We work with frontier labs, big tech, renowned AI startups, enterprises and non-profit research organizations worldwide. We use a combination of Experts + Crowd + Tech Platform to teach AI models to reason and evaluate their efficacy and safety. We have experts in...
-
Machine Learning Engineer
1 tydzień temu
Warszawa, Mazovia, Polska Acaisoft Pełny etat 160 000 zł - 230 000 zł rocznieLocationRemoteWarsawExperienceSeniorYou will be cooperating with a leading provider of AI evaluation and optimization solutions, trusted by multinational companies to optimize AI agents and detect performance issues in large language models.In this role, you'll help develop advanced reinforcement learning (RL) environments and scalable evaluation systems...
-
Middle Data Scientist
5 dni temu
Warszawa, Mazovia, Polska Kyriba Pełny etat 40 000 zł - 80 000 zł rocznieIt's fun to work in a company where people truly BELIEVE in what they're doing We're committed to bringing passion and customer focus to the business.About UsKyriba is a global leader in liquidity performance that empowers CFOs, Treasurers and IT leaders to connect, protect, forecast and optimize their liquidity. As a secure and scalable SaaS solution,...
-
Python Developer with AWS and AI
5 dni temu
Warszawa, Mazovia, Polska Antalpl Pełny etat 60 000 € - 80 000 € rocznieAI Engineer / Python developer + AWS + LLM AgentsLocation: Remote-first Employment type: B2B Contract Role OverviewWe're looking for an AI Engineer to design, implement, and maintain agentic AI systems for our clients. You'll work across LLMs, orchestration frameworks, and data pipelines to deliver robust, observable, and secure automations.This is not a...