Developing a model for the automated identification and extraction of agricultural terms from unstructured text

Hercules Panoutsopoulos; Christopher Brewster; Borja Garcia

doi:10.3390/IOCAG2022-12264

Previous Article in event

AI-powered DSS for resource efficient nutrient, irrigation and microclimate management in greenhouses

Previous Article in session

Crop identification by machine learning algorithm and Sentinel -2 data

Next Article in event

How to produce organic chestnuts? Ecochestnut project: an adaptive project of how to bring organic agriculture within reach of traditional farmers?

Next Article in session

Uses of Radioisotopes to produce high yielding crops in order to increase agricultural production

Developing a model for the automated identification and extraction of agricultural terms from unstructured text

Hercules Panoutsopoulos

^{*

1},

Christopher Brewster

²,

Borja Espejo Garcia

¹ Institute of Data Science, Maastricht University, Maastricht, Netherlands
² Institute of Data Science, Maastricht University, Maastricht, Netherlands & Data Science Group, TNO, Soesterberg, Netherlands
³ Dpt. of Natural Resources and Agricultural Engineering, Agricultural University of Athens

Academic Editor: Bin Gao

Published: 14 February 2022 by MDPI in 1st International Online Conference on Agriculture - Advances in Agricultural Science and Technology session Agricultural Systems and Management

https://doi.org/10.3390/IOCAG2022-12264

Abstract:

The most prevalent medium for conveying research findings and developments within and beyond the domain of agriculture is text whether in the form of scholarly publications, reports, articles, or posts on websites and social media channels. Mining information from text is of utmost importance in order to allow the agricultural (research) community to keep track of the most recent advancements, as well as to update ontologies and other structures that are used to model and formally represent domain-specific knowledge. However, the pace and volume at which texts are currently being produced render the manual extract of information impossible. Therefore, we need to reside in technology-supported, machine learning-based methods capable of mining information from large corpora of unstructured text. Within this context, the aim of this paper is to describe a model for the automated identification and extraction of agricultural terms mentioned in texts that has been built upon spaCy – a free, open-source library for Natural Language Processing in Python. The model has been trained on a properly selected corpus of agriculture-related texts, manually annotated in regard to mentions of agricultural terms. The performance of the model has been evaluated against standard metrics and compared to other similar and baseline term recognition approaches. A detailed discussion is made about the exploitation of the proposed model in terms of further research.

Keywords: Agricultural term extraction; machine learning model; natural language processing; python; spacy

View paper

36 Reads

Hercules Panoutsopoulos

Christopher Brewster

Borja Garcia