logo

View all jobs

QA Engineer – AI Systems & Fine-Tuning (MCP/Gravitee)

Remote, Remote
We are seeking a QA Engineer with a strong background in API testing and LLM fine-tuning/evaluation. You will be responsible for the quality assurance of our "Agent Mesh" infrastructure, ensuring that the MCP servers built in C# and Python correctly translate enterprise business logic into machine-readable actions. Your goal is to ensure that AI agents interact with our Gravitee-managed APIs reliably, securely, and without "hallucinating" tool calls.
Key Responsibilities
  • AI Tool Validation: Test the accuracy of MCP Tool Servers by verifying that LLMs correctly interpret OpenAPI specifications and trigger the right C#/.NET backend logic.
  • Fine-Tuning Data Preparation: Curate and clean high-quality datasets (JSON/JSONL) in Python to fine-tune models for specific domain tasks and tool-calling accuracy.
  • Prompt Regression Testing: Develop automated test suites to ensure that updates to underlying APIs or MCP servers do not break the "reasoning" or "planning" capabilities of the AI agents.
  • Security & Auth QA: Validate that MCP Authentication policies in Gravitee correctly enforce OAuth 2.1 and OpenFGA, preventing unauthorized data leakage through agent conversations.
  • Performance Testing: Use Gravitee Observability tools to measure latency in the agent-to-API loop and identify bottlenecks in MCP server responses.
Technical Qualifications
  • API Testing Mastery: Expert knowledge of REST, OpenAPI, and tools like Postman or Insomnia.
  • Scripting: Proficiency in Python (for data processing and eval frameworks) and familiarity with C# (to understand backend MCP implementation).
  • LLM Evaluation: Experience with frameworks like DeepEval, Ragas, or LangSmith to measure model performance (faithfulness, relevancy, and tool-call precision).
  • API Management: Hands-on experience with Gravitee APIM or similar gateways to monitor and intercept traffic.
  • Model Context Protocol: Understanding of MCP architecture and how it standardizes the way LLMs access external data.
Preferred Skills
  • Experience with Red Teaming AI agents to identify prompt injection vulnerabilities.
  • Knowledge of Vector Databases and how RAG (Retrieval-Augmented Generation) interacts with live API tools.
  • Familiarity with GitHub Actions for CI/CD integration of AI evaluation pipelines.

Share This Job

Powered by