Abstract:
Introduction: Early detection of plant stress is critical for sustainable agriculture. Conventional techniques, including threshold-based sensors and image recognition, face limitations such as environmental interference, high costs, and dependency on visual data. To address these challenges, we propose a novel sensor-to-text approach that leverages natural language processing (NLP) for plant health monitoring.
Methods: A dataset containing soil properties, environmental variables, and nutrient concentrations was prepared with labels (Healthy, Moderate Stress, High Stress). A rule-based algorithm generated descriptive symptom statements from sensor readings (e.g., “Soil is too dry, risk of dehydration”). These narratives were processed using two pipelines: (i) TF-IDF with Random Forest and Support Vector Machine classifiers, and (ii) a transformer-based DistilBERT model fine-tuned for multi-class classification.
Results: Baseline models with TF-IDF and traditional classifiers achieved 85–88% accuracy, with an average F1-score of 0.86. In comparison, the DistilBERT model significantly outperformed them, reaching 95% accuracy, with precision 0.94, recall 0.95, and F1-score 0.94. The transformer approach also showed particular strength in distinguishing subtle differences between moderate and high stress conditions.
Conclusions: This study presents a novel pipeline that converts agricultural sensor data into natural language descriptions for classification using transformer-based NLP. The results highlight the potential of this method to improve plant disease detection, provide interpretable feedback, and support scalable AI-driven advisory systems for farmers.