Building knowledge graphs to speed up queries on genomic information

Authors

DOI:

https://doi.org/10.47187/perspectivas.8.1.250

Keywords:

Genomic information, Knowledge graphs, SPARQL query language, Bioinformatics

Abstract

This article provides an overview of the main formats used to store genomic information, as well as the process of building knowledge graphs from such data. The information extracted from data in the repository of the National Center for Biotechnology Information is processed and structured in the form of knowledge graphs, allowing complex queries to be performed using the SPARQL language. Representing genomic information in a format suitable for semantic querying and inference facilitates the identification of complex relationships between genes, proteins, and metabolites. This, in turn, supports the discovery of new bioactive compounds and promotes advances in fields such as medicine and biotechnology. Finally, it is demonstrated how both simple and compound queries in SPARQL can be applied to retrieve explicit and inferred relationships among the triples that make up the knowledge graphs, thus improving the depth and efficiency of genomic analysis.

References

[1] C. Notredame y J.-M. Claverie, Bioinformatics for dummies (Second Edition). Wiley Publishing, Inc., 2007.

[2] S. F. Altschul et al., "Basic local alignment search tool," J Mol Biol, vol. 215, n.º 3, pp. 403-410, 1990, doi: 10.1016/S0022-2836(05)80360-2.

[3] A. Hogan et al., "Knowledge Graphs," ACM Comput. Surv., vol. 54, n.º 4, Art. 71, may. 2022, doi: 10.1145/3447772. DOI: https://doi.org/10.1145/3447772

[4] B. J. Stear et al., "Petagraph: A large-scale unifying knowledge graph framework for integrating biomolecular and biomedical data," Scientific Data, vol. 11, n.º 1, 2024, doi: 10.1038/s41597-024-04070-w. DOI: https://doi.org/10.1038/s41597-024-04070-w

[5] World Wide Web Consortium (W3C), "SPARQL 1.1 Query Language." [En línea]. Disponible en: https://www.w3.org/RF/sparql11-query/ (accedido el 9 de febrero de 2025).

[6] X. Chen, S. Jia y Y. Xiang, "A review: Knowledge reasoning over knowledge graph," Expert Systems with Applications, vol. 141, 2020, Art. 112948, doi: 10.1016/j.eswa.2019.112948. DOI: https://doi.org/10.1016/j.eswa.2019.112948

[7] F. Feng et al., "GenomicKB: a knowledge graph for the human genome," Nucleic Acids Research, vol. 51, n.º D1, pp. D950–D956, 6 de enero de 2023, doi: 10.1093/nar/gkac957. DOI: https://doi.org/10.1093/nar/gkac957

[8] S. Prasanna, D. Rao, E.J. Simões y P. Rao, "Scalable Knowledge Graph Construction and Inference on Human Genome Variants," ArXiv, abs/2312.04423, 2023.

[9] E. Cavalleri et al., "An ontology-based knowledge graph for representing interactions involving RNA molecules," Scientific Data, vol. 11, 2023. DOI: https://doi.org/10.1038/s41597-024-03673-7

[10] J. Bolleman et al., "A large collection of bioinformatics question-query pairs over federated knowledge graphs: methodology and applications," 1990, doi: https://doi.org/https://doi.org/10.1016/S0022-2836(05)80360-2. DOI: https://doi.org/10.1016/S0022-2836(05)80360-2

[11] R. Osuna González, G. De Ita Luna, R. M. Valdovinos Rosas y Y. Pedraza-Pérez, “Burkholderia Genomic RDF Graph”, Mendeley Data, V6, 2025, doi: 10.17632/pt6xn9mgdf.6.

Published

2026-01-28

Issue

Section

Artículos arbitrados

How to Cite

[1]
“Building knowledge graphs to speed up queries on genomic information”, Perspectivas, vol. 8, no. 1, pp. 24–32, Jan. 2026, doi: 10.47187/perspectivas.8.1.250.