The most prevalent medium for conveying research findings and developments within and beyond the domain of agriculture is text whether in the form of scholarly publications, reports, articles, or posts on websites and social media channels. Mining information from text is of utmost importance in order to allow the agricultural (research) community to keep track of the most recent advancements, as well as to update ontologies and other structures that are used to model and formally represent domain-specific knowledge. However, the pace and volume at which texts are currently being produced render the manual extract of information impossible. Therefore, we need to reside in technology-supported, machine learning-based methods capable of mining information from large corpora of unstructured text. Within this context, the aim of this paper is to describe a model for the automated identification and extraction of agricultural terms mentioned in texts that has been built upon spaCy – a free, open-source library for Natural Language Processing in Python. The model has been trained on a properly selected corpus of agriculture-related texts, manually annotated in regard to mentions of agricultural terms. The performance of the model has been evaluated against standard metrics and compared to other similar and baseline term recognition approaches. A detailed discussion is made about the exploitation of the proposed model in terms of further research.
Previous Article in event
Previous Article in session
Next Article in event
Next Article in session
Developing a model for the automated identification and extraction of agricultural terms from unstructured text
Published:
14 February 2022
by MDPI
in 1st International Online Conference on Agriculture - Advances in Agricultural Science and Technology
session Agricultural Systems and Management
Abstract:
Keywords: Agricultural term extraction; machine learning model; natural language processing; python; spacy