Get Started

Installation

You can simply install the conda environment by pip.

conda create -n stark python=3.11
conda activate stark
pip install stark-qa
About

Load SKBs

Execute the following code to automatically download and load the SKB data.

from stark_qa import load_skb
dataset_name = 'amazon' # 'mag' / 'prime' 
kb = load_skb(dataset_name, download_processed=True)

Now, through skb variable you can access the SKB data. Simple examples are:

node_types = skb.node_type_lst() # list node types in SKBs
relation_types = skb.rel_type_lst() # list relation types in SKBs

print(skb.num_nodes(), skb.num_edges()) # count ndoes and edges in SKBs

print(skb.get_node_ids_by_type(relation_types[0])) # list node ids for a specific type
print(skb.get_node_type_by_id(0)) # get node type by id

print(skb.get_doc_info(0)) # get document info for node
print(skb.get_neighbor_nodes(0, rel_types=relation_types[0])) # list neighbors of a node by relation type

Please see our Doc for detailed usage.

Load STaRK QA

The QA files will be included under specified root or default HF cache folder if root is None. You can also download them separately on our Datasets page.

About
from stark_qa import load_qa

dataset_name = 'prime' # 'amazon' / 'mag' 
qa_dataset = load_qa(dataset_name, root=ROOT, human_generated_eval=False)
human_eval_set = load_qa(dataset_name, root=ROOT, human_generated_eval=True)
idx_split = qa_dataset.get_idx_split()
print(qa_dataset[1]) 

The result is a tuple of (query, query_id, answer_ids, metadata).

# ('What drugs target the CYP3A4 enzyme and are used to treat strongyloidiasis?',
# 1,
# [15450],
# None)

Note: we removed all of the metadata in our released dataset to prevent answer leakage

Cite Us

If you use data or code from STaRK, please cite our benchmark paper below!

@article{wu2024stark, 
	title={STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases},
	author= {
		Shirley Wu and Shiyu Zhao and
		Michihiro Yasunaga and Kexin Huang and 
		Kaidi Cao and Qian Huang and 
		Vassilis N. Ioannidis and Karthik Subbian and 
		James Zou and Jure Leskovec
	},
	eprinttype = {arXiv},
	eprint = {2404.13207},
	year = {2024}
}

Pipeline