A Machine Learning Based Approach to Study Morphological Features of Bees

: Bees are the major pollinators of agricultural crops and due to numerous factors, the global bee population is declining drastically. Identiﬁcation and extraction of numerous body features of bees can allow us to understand the population dynamics and bee-hive health of an agricultural area. Morphological key-based bee studies are well established procedures for these tasks, which are time consuming and need critical knowledge about different bees species. Recently, numerous machine learning (ML) methods have been implemented on numerous insect species, but there is a scarcity of deep learning models for morphological studies of bees. In our current study, we applied ML methods to extract variants of class activation maps that visually display distinguishing morphological features of bees. We sourced an image data set of eleven different species of Bumblebee ( Bombus sp. ), Honey bee ( Apis sp. ) and Carpenter bee ( Xylocopa sp. ) from iNaturalist, curated and ﬁne-tuned against ﬁfteen state-of-the-art image classiﬁcation models. An accuracy of 93.66% was obtained with a ResNest101e model, and including data augmentation improves the performance to the highest accuracy of 94.27%. We also compared the ML extracted visual features with traditional morphological key-based features and showed existing unsupervised ML models are error prone in numerous instances due to their focus on overall features, whereas manual methods beneﬁted by focusing only on the main discriminating body features, showing a potential scope of improvement the existing models. Overall, our model will be implicated in bee-morphology based tasks of apiculture, such as distinguishing between healthy and parasitic bees, and classiﬁcation tasks of similar insect species.


Introduction
Bees are the essential element of our ecosystem, contributing as primary pollinators of agricultural crops and promote the sustainable development of our ecosystem [1,2]. Approximately 25000 bee species has been reported worldwide [4]. In recent years, there is a clear declination trend in both wild and domesticated bee populations around the world, with threats including the loss of habitat, increased use of pesticides, invasive species and climate change [15,16]. In parallel this would result in the decline in the wild plant and crop population with the 75% of the latter mostly pollinated by bees [17]. The decline in bee populations is of concern not only to the greater ecosystem, but also in industries relying on bee products. Various industries relying on the produce from bees such as the honey and wax industries would suffer greatly. The study of bee morphology and identify them down to the species level would allow us to understand bee-hive health and population dynamics in target areas for which specific measures can be taken to maintain and promote their livelihood and impact on the greater ecosystem [3].
Traditional bee morphology studies were mainly focused on systematics of different bee species. Systematics is a prevalent task, which involves manual collection of mor-phometric data such as wing data and curation by a domain expert (taxonomist) [19]. Recently numerous DNA-based [20] and machine learning (ML) based [5][6][7] approaches had been implemented to compensate workloads in bee systematics, but there is a still lack of approach for bee morphology studies [20]. Recent advances in ML models have paved the way to accurately derive discriminating features from general images, which has potential to overcome manual curation works in morphological works. In particular, the advent of Convolutional Neural Networks (CNN) [8,9] for image analysis have produced models with identification accuracy comparable or better than humans in multiple visual recognition tasks [9][10][11][12]. CNN models achieved high accuracy (90%+) when differentiating between closely related subordinate level categorisation using general image data [14]. DeepABIS [45] and ABIS [5] models utilised different CNNs to automatically generate features from wing images of bees to classify up to species and subspecies level, but both of the models need heavily curated wing data set. Data acquisition and feasibility for such extensive models become thus low as these curating procedures are extremely labour intensive and sometimes impossible depending on the number of bee species. Although the imaging of bees wings are few and specific, there are a number of extensive public data set of general images of bees with diverse backgrounds. A image-based ML model, capable of successfully identifying bees and their distinguishing morphological features, would allow us to measure population dynamics more quickly and conveniently, levels of decline and thus empower us with the information to generate conservation strategies.
In this paper, we explore the task of bee morphology study with various deep learning models and qualitatively analyse the effectiveness of class activation maps from the best performing model on extracting distinguishing features from our chosen eleven bee species. A successful extraction of distinguishing morphological features from the class activation maps would allow us to fine-tune ML models to achieve better results on classification tasks with the same population of bee species, on top of this, we can gather valuable data to explore what a machine learning model considers the distinguishing differences between bees in a particular subset of species.

Methodology
To briefly describe our experimental design, we collected bee images from public domain, studied the key morphological features from existing literature and compared the features with ML extracted features to determine the discrepancy between traditional and state-of-art ML methods and probable solution to enhance the model accuracy for studying bee morphology. To extract ML extracted features, we used numerous ML models to classify different bee species, identified two best performed models, enhanced their performance using data augmentation and implemented class activation map (CAM).

Data collection and pre-processing
The images of current study were sourced from iNaturalist.org, which contains images of different organisms, with 92.3% to 97.3% proper taxonomic annotations or classifications [37]. We focused on the major three bee genera around the world, namely -Apis, Xylocopa and Bombus. Bees of Apis and Bombus genera are known as honey bees and bumble bees, well known for producing honey and wax, whereas bees of Xylocopa genus are called carpenter bees, which do not produce honey and some are known as parasitic to wood plants. From these three genera, we choose top eleven species based on the number of available images and selected following species: A. mellifera, X. virginica, X. micans, X. sonorina, X. tabaniformis, X. violacea, B. griseocollis, B. impatiens, B. pensylvanicus, B. terrestris, and B. vosnesenskii. Images were downloaded in bulk using an in-house python script, and the number of images were balanced based on the lowest available images of X. micans. Finally the data set contains a total of 24,695 images of bees, with 2,245 images of each species. For pre-processing, we resized the images to 224 x 224 pixels and normalised to fit into pre-trained ImageNet models. From this processed dataset, 80% and 20% was used for training and testing purpose.

Traditional Key Features
We collected traditional key morphological features of the 3 genera obtained from dichotomous keys and related genera studies [34][35][36]. In particular, we focused mainly on distinguishing visual features (e.g. body coloration, body shape, size, wing features), which can be captured in visual images, and models can use them as classifiers for classification.
These models were executed using either a modified version of Wightman's pytorchimage-models [25] or a direct code implementation. Further, all layers in the models were fine-tuned for our bee dataset. The deep networks were trained using a mini-batch stochastic gradient descent optimiser with a batch size of 32. Learning rate, momentum, and weight decay were kept at 0.01, 0.9, and 0.0001 respectively. We also employed a dropout [43] value of 0.2 to prevent overfitting. Each model was executed for a total of 100 epochs with a NVIDIA 2070 RTX GPU with 8gb onboard memory. Further, for data augmentation, we applied RandAugment [33] with a 50% chance and a transformation magnitude of 9 to the best two performing models onto our dataset. We utilised the gradients within the best performing model to obtain a Gradient-weighted Class Activation Mapping (Grad-CAM [44]) which visually indicates the discriminative region used by the model to classify bee species. We used a modified Grad-CAM++ [38] which outperformed than the original Grad-CAM on providing visual explanations for deep learning models.

Results
Our first step was to determine and train a deep learning model on our bee data set in order to obtain a machine-based representation of each class. We experimented on eleven different state-of-the-art deep learning models for image classification in order to determine the best architecture for our bee task. Major distinguishing features of three bee genera has been shown in Table 1.Further, figure 1 illustrates the classification accuracies obtained based on different CNN architecture families. Models of EfficientNet, ResNet and other families got accuracies of 92.13±0.75%,91.57±1.13%, 89.21±1.22% respectively. From the ResNest family, ResNet101e, Tresnet, CSPResNet50 and ResNetv2 got accuracies of 93.66%, 92.25%, 92%, 88.38% respectively. Models of EfficientNet families outperformed all of the models. Tfefficient got accuracy of 90.86%, whereas with the inclusion of noisy student, it improved to 93.41%. Overall, among the eleven models, ResNest and TFefficient Net with noisy student best performed on highest accuracy for bee classification. We used state-of-art data augmentation methods on these top two models to boost their classification performance. We have found that the performance of both models improved as a result of RandAugment, where ResNest101e and TFefficient Net with noisy student models resulted in the accuracy of 94.27% and 94.07% respectively. For CAM, we used the best performed ResNest101e model to retrieve features from bee dataset with highest confidence (Figure 2). A qualitative analysis of the images compared to the traditional morphological features of their genera showed us -For Apis and Bombus, the ResNest101e model mainly focused on the abdomen, lower parts of the head and also able to capture both the body colouration and strips. on the other hand, for Xylocopa, the model focused more on the wings and head portions of the bee with CAMs often following the shape of the wings.

Discussion
In our research we have aimed to produce a modified model capable of producing visual maps displaying distinguishing features between bees spanning three genera and eleven species. Using a modified ResNest model and data augmentation, we got 94.27% classification accuracy. Further, Grad-CAM++ implementation on our model resulted in CAMs of the best samples images indicative of their respective class. With a qualitative analysis and comparison of the CAMs against traditional key morphological features for each genus we find that the CAMs produced are insufficient in fine-tuning and discovering key features between each class. ResNest is able to outperform EfficientNet based models on various object detection and segmentation transfer learning tasks as well as image classification on ImageNet [39]. Our modified ResNest model well-suited for fine grained classification task of bee species, compared to other state of the art models as it achieves the highest classification accuracy of 94.27%. In our experiments we also explore the effects of model scaling with regards to classification performance improvement for our task. On the topic of ResNest, despite the lower amount of pre-trained data the split attention blocks potentially work well to improve learned feature representations for our task, with the scaled depth helping to learn complex features of the bees such as wings. An analysis of the models leads us to the following conjectures regarding their high performance against the bees database. First we can compare the usage of pre-trained data in each model, both NoisyStudent and TensorFlow EfficientNet use JFT-300M and ImageNet as opposed to a majority of the other tested models including ResNest. Naturally this increases the performance of the models. Comparing the performances of the less scaled models (e.g. tf-efficientnet-b0 and ResNest50d), we have found that, scaling up the image resolution, depth and width provide accuracy boosts. Scaling of all three dimensions provides larger improvements to the accuracy as opposed to the singular depth adjustment in ResNest. Despite this, we can see that the modular split attention blocks in ResNest are much more effective for a fine-grained classification task.With regards to NoisyStudent we can see from the improvement in accuracy compared to the base tf-EfficientNet-b3 model that potentially the forceful generalisation of the student teacher model helped with the small feature representations in the bee database.
Previous works on classifying bees such as DeepABIS [45] and ABIS [5] have focused on images of bees in controlled environments and mainly on their wing data. The focus on wing data aligns with our findings of traditionally recognised distinguishing features of bees in Table 1, which, along with the controlled lab imaging environments explains their relatively high classification performance. As it is often difficult to inspect fine features of bee wings in our general image data set, there has been a larger focus of our model to distinguish based on other largely visible body features and this can be identified in the CAM images. Thenmozhi et al. [40] uses a ResNet101 model on a data set of 40 insect species and achieves an accuracy of 93.99%. With a larger data set and scope of insect species we can attribute the success of the model cropped and zoomed in images of the insects in the data set as opposed to the general images in our study. Nguyen et al. [41] achieves its highest accuracy of 95.52% with an EfficientNetb3 model on a data set of 4,449 images of 5 different insect families (Ladybird, Mosquito, Grasshopper, Butterfly, Dragonfly). The large number of differences between these insects is a large factor in the success of their model, in comparison, with our data set of eleven different bee species with complex backgrounds, we still achieve similar accuracies with an EfficientNetb3 model (93.136%). From our analysis of traditional morphological features in Table 1, we have found wing features are the most discriminatory, however, as the images in our data set do not have normalised imaging conditions and are often of low quality, our trained models were focused mainly on larger features as can be seen in the CAM images.
Comparatively, our study explores the distinguishing visual features that can be extracted from the models, in particular we analysed the Grad-CAMs based on our best performing model ResNest101e. The CAMs, resulted from our study, are not visually specific enough to conduce fine-grained features as distinguishing features of a bee genus. However, while this is the case, our experiments show that more general areas of a bees body can be successfully distinguished between classes despite the vastly imbalanced nature of user-images. For classes between the Apis and Xylocopa genus this can be more than enough to distinguish two samples, however, for bees with overlapping features such as species within the Xylocopa and Bombus genus, the visual features extracted by the CAM may prove to be insufficient. The inefficiency of the interpretability of the CAMs for our task can be identified by the lack of feature knowledge by the deep learning models. While to us humans morphological features of bees can be identified visually from images, it is difficult for a model to learn and categorise these features from a single class labelled image.

Conclusion
In our study, we explored the morphology of bee species and proposed the usage of Class Activation Maps to obtain visual indicators of distinguishing features. We constructed a data set of 24,695 bee images spanning eleven species and three bee genera, used different state-of-the-art deep learning model architectures and obtained the highest accuracy of 94.27% with a ResNest101e model following data augmentation. From an analysis of samples with the highest model confidence we conduce that CAMs are sufficient to highlight general areas of bees that are enough to distinguish species such as in the Apis and Xylocopa genera, however, insufficient for those with many overlapping features. A key limitation in our approach is the lack of defined feature annotations for each input image which stifles the deep learning models ability to learn key features. A lack of computational processing power also meant that we were unable to explore further on a larger set of classes. Future work could be conducted on a smaller set of annotated bee images outlining morphological features of each bee in order to obtain more robust classification and visual results. Additional modified CAM approaches could also be considered such as applying a majority membership alongside annotated feature areas of interest.