December 5th, 2023
Biocuration is the process of analyzing biological or biomedical articles to organize biological data into data repositories using taxonomies and ontologies. Due to the expanding number of articles and the relatively small number of biocurators, automation is desired to improve the workflow of assessing articles worth curating. As figures convey essential information, automatically integrating images may improve curation. In this work, we instantiate and evaluate a first-in-kind, hybrid image+text document search system for biocuration. The system, MouseScholar, leverages an image modality taxonomy derived in collaboration with biocurators, in addition to figure segmentation, and classifiers components as a back-end and a streamlined front-end interface to search and present document results. We formally evaluated the system with ten biocurators on a mouse genome informatics biocuration dataset and collected feedback. The results demonstrate the benefits of blending text and image information when presenting scientific articles for biocuration.
Index Terms - document search, biocuration
Trelles Trabucco, J., Floricel, C., Arighi, C., Shatkay, H., Raciti, D., Ringwald, M., Marai, G.E., MouseScholar: Evaluating an Image+Text Search System for Biocuration, 2023 IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM 2023), Istanbul, Turkey, December 5th, 2023.