Evaluation and Metrics
MMLU (Massive Multitask Language Understanding) Benchmark
A comprehensive benchmark designed to measure a LLM's knowledge and comprehension abilities across a wide range of 57 subjects, from elementary math to US law and history. It assesses the model's ability to answer multiple-choice questions.
← Geri