I built RAGCheck as a Python 3.11 tool to automate RAG testing. It auto-generates Q&A pairs from your documents, runs them on multiple LLMs, and shows results in an interactive web dashboard.
Automated Test Generation
I parse each document and spin up a harness to create question–answer pairs automatically.
Interactive Dashboard
I built a web UI to visualize metrics, compare models, and track performance in real time.
Tech Stack
- Python
- OpenAI, Mistral & Gemini APIs
- Pandas & NumPy
- Streamlit & Plotly
Features I Built
Generate Q&A pairs from docs for end-to-end RAG checks.
Run tests on GPT-4o-mini, Mistral-3B, Gemini 1.5, and more.
Visualize scores, compare models, and track improvements.
Process hundreds of test cases concurrently and fast.
Handle PDFs, DOCX, TXT, and other formats out of the box.
Binary scoring with detailed pass/fail explanations.
From ingestion to visualization, a one-command pipeline.
Built entirely in Python 3.11 for easy customization.