BioKGBench

Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science

Xinna Lin, Siqi Ma, Junjie Shan, Xiaojing Zhang, Shell Xu Hu,
Tiannan Guo, Stan Z. Li, Kaicheng Yu

AI agents built upon Large Language Models (LLMs) are designed to save human effort in scientific discovery, and being able to accurately understand the existing research is the most critical ability of such AI scientists. However, systematically evaluating this ability remains largely unexplored.

In this work:

  • We propose a novel agent evaluation benchmark, BioKGBench, to assess the agents' capabilities in utilizing external tools and understanding both structured data (knowledge graphs, KGs) and unstructured data (literature). Specifically, we introduce two atomic tasks, knowledge graph question answering (KGQA) and scientific claim verification (SCV), as well as an agent task for knowledge graph checking (KGCheck).
  • We find that none of the existing agents can accomplish our tasks without moderate adaptation. Therefore, we introduce our agent BKGAgent, the first agent framework to interact with external knowledge graphs as well as research papers, achieving large-scale data understanding and validation beyond human capabilities.

BioKGBench

Starting from the origin of CKG—Proteins, we select the sub-graph to contain exactly 12 categories of biological entities.

Thus, the sub-graph consists of 484,955 entities (nodes) across 12 categories (Biologically defined) and 18,959,943 relationships (edges) of 18 types, with each type consisting of relationships between a unique pair of entity categories.

After the sub-graph is ready, we construct the question set for the Question Answering (QA) database in two steps.

- We first handcraft question templates by selecting biomedical fields and pinpointing entities and relations in the CKG. Natural language questions were constructed in various formats, ensuring their accuracy through peer reviews and expert consultations.

- We then expand our dataset with autogenerated questions by matching CKG data to constructed QA templates, resulting in the generation of 698 questions across 3 reasoning types and 16 question categorie.

Statics of three different reasoning types of KGQA dataset

BKGAgent

We propose a biomedical knowledge-graph agent (BKGAgent), a multi-agent framework based on langgraph, capable of retrieving information from knowledge graph and cross-validating its correctness with multiple information sources. Our framework is comprised of three agents and a tool executor:

- the team leader for the progress control.

- the KG agent for information retrieval from KG.

- the validation agent for checking the correctness of the information from KG.

- the tool executor is solely responsible for executing functions, and is not based on LLMs.

This setup simulates the workflow of a human research team, where a leader supervises the assistants' work and makes the final decision based on their feedback.

Citation:

@misc{lin2024biokgbenchknowledgegraphchecking,
            title={BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science}, 
            author={Xinna Lin and Siqi Ma and Junjie Shan and Xiaojing Zhang and Shell Xu Hu and Tiannan Guo and Stan Z. Li and Kaicheng Yu},
            year={2024},
            eprint={2407.00466},
            archivePrefix={arXiv},
            primaryClass={cs.CL},
            url={https://arxiv.org/abs/2407.00466}, 
            }