RAGCheck

RAG System Evaluation Tool

I built RAGCheck as a Python 3.11 tool to automate RAG testing. It auto-generates Q&A pairs from your documents, runs them on multiple LLMs, and shows results in an interactive web dashboard.

Automated Test Generation

I parse each document and spin up a harness to create question–answer pairs automatically.

Interactive Dashboard

I built a web UI to visualize metrics, compare models, and track performance in real time.

Tech Stack

Python
OpenAI, Mistral & Gemini APIs
Pandas & NumPy
Streamlit & Plotly

Features I Built

Automated Testing:

Generate Q&A pairs from docs for end-to-end RAG checks.

Multi-LLM Support:

Run tests on GPT-4o-mini, Mistral-3B, Gemini 1.5, and more.

Interactive Dashboard:

Visualize scores, compare models, and track improvements.

Batch Processing:

Process hundreds of test cases concurrently and fast.

Document Support:

Handle PDFs, DOCX, TXT, and other formats out of the box.

Evaluation Metrics:

Binary scoring with detailed pass/fail explanations.

Streamlined Workflow:

From ingestion to visualization, a one-command pipeline.

Python-Based:

Built entirely in Python 3.11 for easy customization.