Welcome to TAIL!

TAIL
Automatic, Easy and Realistic tool for LLM Evaluation

Getting Started User Guide

Features

Easy to customize

TAIL helps you generate benchmarks on your own documents (Patents, Papers, Financial Reports, anything you are interested in). It allows you to create test examples of any context length and questions at any depth you desire.

Realistic and natural

Unlike the needle-in-a-haystack test, TAIL generate questions based on information from your own document, instead of inserting a piece of new infomation, making the benchmark more realistic and natural.

Quality assured

TAIL utilizes multiple quality assurance measures, including RAG-based filtering and rigorous quality checks, to eliminate subpar QAs and deliver a high-caliber benchmark.

Ready-to-use

TAIL integrates an out-of-the-box evaluation module that enables users to easily evaluate commercial LLMs via API calls and open-source LLMs via vLLM on the generated benchmarks.