STaRK-MAG

STaRK-MAG forcus on precise academic paper searches. It features a complex network of entities and relationships centered around paper nodes, particularly focusing on citation and authorship. The queries involve single-hop of multi-hop relational queries along with textual properties sourced primarily from abstracts, such as the paper's topic and methodology.

Semi-structured Knowledge Base

The MAG SKB comprises four types of node entities (paper, author, institute, and field_of_study) and four types of relations (author_writes_paper, paper_cites_paper, paper_has_field_of_study, author_belongs_to_institute).
The textual data consists of paper titles and abstracts, enriched with additional details like venue, author, and institution names from the Microsoft Academic Graph database (version 2019-03-22).

Data Statistics

MAG Semi-structured Knowledge Base
Num of entity types 4
Num of relation types 4
Num of avg. degree 43.5
Num of entities 1,872,968
Num of relations 39,802,116
Num of tokens 212,602,571

STaRK QA Dataset

Data Statistics

Synthesized Human-generated
Num of queries 13,323 84
Num of queries with multiple answers 6,872 34
Average num of answers 2.78 3.26

License: CC-BY-4.0

Query Dataset Examples

Synthesized

Q: Does the Reva Institute of Technology and Management have any publications on the improvement of nucleate boiling heat transfer performance using nanofluids?

A: 1356012, 1290861, 1437510, 1254191

Q: What are some research papers that reference ""Fast resonance decays in nuclear collisions"" and also conduct an analysis of hydrodynamics models in nuclear physics?

A: 1743939, 1213289, 1316848, 1210353, 1783795, 1784533, 1809438, 1786266, 1544702

Q: Could you help me find papers from the same authors who contributed to ""On the influence of spatial sampling on climate networks""? I'm particularly interested in their exploration of new dimensionality measures and how they applied these to various complex systems.

A: 1591807

Human-generated

Q: Find me papers that discuss improving condensers perdformance authored by Stojan Hrnjak.

A: 1740813, 1740967

Q: What paper in the field of laminar flow and turbulence describes the gas flow of cold atmospheric pressure plasma jets specifically for biomedical applications?

A: 1814164, 1326172

Q: What is a paper on the electro-optic effect that builds on prior work showing the feasibility of reconfigurable ultrafast all-optical NOR and NAND gates and other sequential logic circuits using the Mach-Zehnder interferometer structure?

A: 1174496, 1201153

Reference

[1] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: datasets for machine learning on graphs. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS '20).

[2] Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June Paul Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In WWW.

[3] Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. 2020. Microsoft Academic Graph: When experts are not enough. Quant. Sci. Stud. (2020).