STaRK: Benchmarking LLM Retrieval on SKBs

Overview

The STaRK benchmark features three novel retrieval-based question-answering datasets, each containing synthesized train/val/test sets with 9k to 14k queries and a high-quality, human-generated query set. These queries integrate relational and textual knowledge, closely resembling real-world queries with their natural-sounding language and flexible formats.

The datasets are based on three knowledge bases covering product search, academic paper search, and biomedical inquiries. Each knowledge base is semi-structured, featuring large-scale relational data among entities and comprehensive textual information for each entity.

Explore Our SKB

Overview

STaRK-Amazon

STaRK-MAG

STaRK-Prime