Data Mining and Non-Invasive Proximal Sensing for Precision Viticulture

Modern and sustainable viticulture entails objective and fast monitoring of crucial variables for rational decision making. Data mining strategies may be applied to agricultural data, with the aim of yielding useful, reliable and objective information. This work presents the most recent applications of machine learning algorithms to grapevine plant phenotyping, such as varietal discrimination and water status assessment. Support vector machine (SVM) and modified partial least squares (MPLS) models were built using NIR spectra acquired in the vineyard, on grapevine leaves, with a portable spectrophotometer working on the spectral range between 1600 to 2500 nm. Spectral measurements were acquired on the adaxial side of 200 individual leaves (20 leaves per cultivar) of ten (Vitis vinifera L.) varieties. Sequential minimal optimization (SMO) algorithm was used for the training of a SVM for varietal discrimination. The classifier’s performance for the 10 varieties surpassed the 94.9% mark. For water status assessment, the predictive model based on MPLS using the reflectance spectra of four cultivars, and the first and second derivative, yielded a R2 = 0.83 for stem water potential (Ψstem), which is widely recognized as an integrative indicator of whole-vine water status, but destructive and very laborious. These results show the power of the combined use of data mining and non-invasive sensing for grapevine phenotyping and their usefulness for the wine industry.


Introduction
The study of grapevine phenotype, which involves its physical and biochemical traits as a result of the interaction of its genotype and the environment, is a key topic in modern viticulture.There are thousands of Vitis vinifera L. grapevine varieties worldwide [1] and their cultivation, wine quality potential and the price paid for their grapes is variety dependent [2].Also, the ability of set up irrigation rules is based on soil water balance calculations or direct measures of soil moisture.Nonetheless, these methods could not be totally reliable due to soil heterogeneity and are likely to return cumulative errors [3].
Current methods to address variety discrimination include visual ampelometry [1], wet chemistry genetic [4] and isoenzyme analyses [5], and very recently hyperspectral imaging under laboratory conditions [6].However, identification of grapevine varieties under field conditions in order to recognize grapevines of not-allowed cultivars in certain appellation regions worldwide or unknown vines in older vineyards, where more than a single cultivar was often planted is of great interest in viticulture.
In terms of the plant's water status, midday stem water potential (Ψ stem ) has been widely accepted as a useful and reproducible index of the plant water status, also proposed as a more integrative indicator of vine water condition [7], yet still this method requires time-consuming and destructive ways to carry out the measurement.
Spectroscopy is based on the interaction of electromagnetic radiation with matter at different wavelengths.The use of local spectroscopy for plant variety discrimination has been applied in controlled indoor environments in several crops, such as bayberry [8] and strawberry [9] using LOCAL algorithm and partial least squares as discrimination methods, respectively, and outdoors in tomato [10].Several works have demonstrated the suitability of leaf reflectance to evaluate grapevine and other crops water status [12][13][14][15].
This work discusses the use of data mining applied to non-invasively retrieved data to yield key information for plant phenotyping in the frame of precision viticulture.

Spectra collection
For varietal discrimination, spectral measurements were acquired under field conditions on the adaxial side of 200 individual leaves (20 leaves per cultivar) of 5 red: Cabernet Sauvignon, Carmenere, Tempranillo, Pinot Noir, Caladoc, and 5 white: Viura, Treixadura, White Grenache, Pedro Ximenez, Viognier (Vitis vinifera L.) varieties.For stem water potential estimation, spectral measurements were acquired under field conditions on the adaxial side of 80 individual leaves (20 leaves per cultivar) of 4 red: Tempranillo, Grenache, Cabernet Sauvignon and Marselan (Vitis vinifera L.) varieties.The stem water potential was measured after spectra acquisition using a Scholander pressure Chamber (Model 600, PMS Instruments Co., Albany, USA).All measurements were taken in a commercial vineyard located in Navarra (Spain) during the ripening period of season 2012.
An integrated handheld NIR spectral analyzer (microPHAZIR™, Thermo Fisher Scientific Inc., Waltham, MA, USA), working in reflectance mode (log 1/R) in the range of 1600-2400 nm with a non-constant interval of 8.7 nm (pixel resolution 8 nm, optical resolution 12 nm) was used.
Sensor integration time was 600 ms.The device was equipped with quartz protection to prevent dirt accumulation.For each leaf, five spectral measurements were acquired and sample temperature during measurements ranged between 23 to 25°C.Vinyl gloves were used at all times to handle the leaves to not distort them with external pollutants from hand manipulation.
Figure 1 shows a block diagram describing the experimental setup.

Spectra processing for variety discrimination
Due to measurement spectral inaccuracies, one sample from Treixadura and another one from White Grenache were removed.For all cultivars, the average from the five spectral measurements per leaf was computed and considered the average spectrum per leaf.In order to avoid scattering issues, several pre-processing algorithms were applied to the spectra: Standard Normal Variates (SNV) followed by a detrending and a first grade Savitzky-Golay filter (window size: 5).
Support vector machines (SVMs) were used as a learning method for the training of the variety discrimination model.A SVM was trained using the Sequential Minimal Optimisation (SMO) algorithm [16] with polynomial kernel and a complex parameter C equals to 3.5.
The accuracy of the trained model was evaluated by using cross-validation and the percentage of correctly-classified examples.

Spectra processing for water stress estimation
For all cultivars, the average from the five spectral measurements per leaf was computed and considered the average spectrum per leaf.Principal component analysis (PCA) was used to reduce the dimensionality of the data to a smaller number of components, to examine any possible grouping and to visualize the presence of outliers based on Global Mahalanobis (GH) distance [17].
As spectral pre-treatments, the Standard Normal Variate (SNV) plus Detrending (DT) procedure was used to remove the multiplicative interferences of scatter, in addition to a derivative mathematical treatment.Modified Partial Least Squares (MPLS) regression was tested for the prediction of Ψ stem .To prevent over-fitting, the assessment of the calibration model was performed by cross-validation.Chemometric analysis was performed using the WinISI II software package version 1.50 (Infrasoft International, Port Matilda, PA, USA) and the Unscrambler software package version 9.1 (CAMO ASA, Oslo, Norway).

Varietal discrimination
The results obtained with the Sequential Minimal Optimization algorithm for the discrimination of 10 grapevine varieties are shown in Table 1.

Classified as
The global output resulted in a 95% accuracy in the discrimination of the whole set of varieties, where 188 out of 198 instances were correctly classified.
From an individual point of view, 3 varieties (Viura, White Grenache, Pinot Noir) have reached a perfect score (100% of correct discrimination), while the remaining varieties obtained values between 90% and 95%.
As it can be seen from the confusion matrix in Table 1, two samples of Pedro Ximenez were misclassified into Viognier and White Grenache, both being white varieties.The same applied to red varieties, such as Cabernet Sauvignon and Carmenere, where both exhibited a misclassified sample.One sample from a white variety (Viognier) was incorrectly classified as a red one (Caladoc).Finally, two red samples,one from Tempranillo and another one from Caladoc, were misclassified as white varieties,White Grenache and Viognier, respectively.
NIR information has been used for cultivar identification in plum [20] and strawberry [9] using the spectra acquired on the fruits.Also, leaves have been recently used for grapevine discrimination through hyperspectral imaging [6] under laboratory conditions and using the visible range of 380 to 1028 nm.

Assessment of plant water status
Table 2 shows the best outcomes obtained for Ψ stem assessment.The correlation coefficient reached the 0.83 mark in cross-validation, exposing a higher accuracy than that reported in a recent work [15] (SECV = 0.18 and R 2 = 0.71).These results present discrimination and regression models with a high precision, that shows how accurate the information retrieved from non-invasively acquired NIR spectra in the range of 1600 to 2500 nm of grapevine leaves can be for the discrimination of grapevine varieties and the assessment of vineyard water status using different data mining approaches.

Conclusions
The remarkable performance of the developed models under field conditions paves the way for the use of data mining algorithms in combination with non-invasive sensing tools -such as a portable NIR analyzer -as powerful phenotyping and water stress assessment methods in viticulture and other crops.These results open a gateway for the fast and non-destructive varietal classification and water status estimation in viticulture and -potentially -in general agriculture.

Table 2 .
Calibration and cross-validation results for stem water potential (Ψ stem ).