After the sub-graph is ready, we construct the question set for the Question Answering (QA) database in two steps.
- We first handcraft question templates by selecting biomedical fields and pinpointing entities and relations in the CKG. Natural language questions were constructed in various formats, ensuring their accuracy through peer reviews and expert consultations.
- We then expand our dataset with autogenerated questions by matching CKG data to constructed QA templates, resulting in the generation of 698 questions across 3 reasoning types and 16 question categorie.