DNA Barcoding of 15 Species of African Saturniidae (Lepidop-tera) across 12 Genera Reveals High Incidence of Non-Monophyly

: African Saturniidae (Lepidoptera) been poorly described despite their nutritional and economic significance in Southern Africa. DNA barcodes for 15 species across 12 genera were generated as a starting point to create a reference sequence library for African Saturniidae. Phylogenetic clustering and genetic divergence estimates were used to assess patterns of genetic diversity in each genus, including all sequences available on BOLD Systems. Cases of polyphyletic and pa-raphyletic species affected 81% of the dataset. Taxonomic misidentifications, clerical and laborato-rial errors, introgression, hybridization and incomplete lineage sorting may contribute to explain these results. We propose that the dataset available for African Saturniidae does not correctly represent the genetic diversity of a high proportion of the taxa in the family.


Introduction
The family Saturniidae (Lepidoptera) contains 3,454 species distributed in 169 genera worldwide across diverse terrestrial ecosystems [1].Saturniidae include a large number of Emperor moth species that play roles in human societies [2].Some species are economically relevant as silk producers, e.g., Samia cynthia (Tribe: Attacini) and Antheraea assamensis (Tribe: Saturniini), distributed and farmed in India, Myanmar, Indonesia, and Sri Lanka [3].
In Africa, Saturniidae include 185 species in 46 genera and over 80 of these are edible at their larval stage [4][5][6].Caterpillars are mostly consumed in southern, central, and western Africa where they provide an important source of protein and vitamins, as well as seasonal income to many rural communities [7].Bunaea alcinoe and Cirina forda are widely distributed and consumed in sub-Saharan Africa, whereas Gonimbrasia belina and Gynanisa maja are more prevalent in southern and central Africa [7].The increasing demand for favored species has become a threat to their sustainability due to overharvesting and habitat erosion.
Despite their importance, African Saturniidae have been poorly described, and genetic data that could assist in understanding patterns of biodiversity in this group is limited, except for DNA barcodes available on BOLD Systems (https://v4.boldsystems.org/)for a small proportion of species.Additionally, a recent study reported the complete mitochondrial genomes of two economically relevant species (Gonimbrasia belina and Gynanisa maja) and identified the most informative regions for use in future population studies [8].Genetic information is a prerequisite for understanding the distribution of the diversity of populations and for managing the sustainable use of wild resources.Baseline genetic data on African Saturniidae is currently scarce, and the taxonomic coverage across the rich diversity of species on the continent is incomplete.DNA barcoding is a convenient tool for cataloguing biodiversity by sequencing a short standardized region of a well-defined gene [9].The use of a fragment of the mitochondrial cytochrome oxidase I (COI) gene is widely applied as a species identification marker in insects due to its highly conserved nature, and low intraspecific and high interspecific variation.DNA barcoding also has the capability of linking the different life stages of an organism, thus allowing for species identification in the absence of welldescribed taxonomic characters, or incomplete specimens.DNA barcoding relies on the use of reference barcodes acquired from expertly identified specimens and deposited on publicly available online platforms (e.g.BOLD Systems, http://www.boldsystems.org/).
BOLD Systems can assist in assigning newly encountered specimens to their species [10][11], as the platform provides a species identification tool that retrieves an identification when the query sequence has less than 3.00% divergence from a reference sequence.However, the efficacy of BOLD Systems has recently been examined for having high incidences of non-monophyly presumed to result from taxonomic misidentifications across many animal phyla [12,13].Therefore, it is indispensable to combine expertise in classical taxonomy and DNA barcoding for building useful databases [10][11][12][13].
This study aimed to contribute to cataloging the biodiversity of African Saturniidae using DNA barcoding.The main objective was to sequence the standard COI barcoding region of 15 species found in South Africa, as a starting point for creating a highly curated library of DNA barcodes for the family.

Specimen collection and DNA extraction, PCR and Sequencing
Adult and caterpillars were collected from various locations in Namibia and South Africa from March 2018 to January 2021.All specimens were identified based on currently available taxonomic keys.Total DNA was extracted from adult and caterpillar legs using a standard phenol-chloroform protocol [15].A total of 198 specimens was sequenced for the standard COI barcoding region (702 bp) using a variety of PCR primers (Table 1).PCR amplifications were performed using the QIAGEN Multiplex PCR Kit (QIAGEN), according to manufacturer's protocol.Sanger sequencing was performed at the Central Analytical Facilities (CAF) of Stellenbosch University, South Africa.Table 1.Primers used for the amplification of the standard COI barcoding region of 15 African Saturniidae species.

Primer
Sequence (5 *Primer used for Sanger sequencing, **Primers designed in this study.

DNA analyses
Genetic clustering and genetic divergences were calculated using the new sequences generated in this study and DNA barcodes taxonomically assigned to the genera under study and available on BOLD Systems as of September 2020.The final dataset included a total of 1,868 sequences representing 191 species across 12 genera.Multiple sequence alignments were performed using the MAFFT algorithm [16] in Geneious Prime v2021.1 (https://www.geneious.com).Genetic clustering of sequences was assessed using the Neighbour-joining (NJ) method in MEGA X [17], under the Kimura 2-parameter model (K2P) [18].
Genetic divergences were estimated as maximum pairwise distances (max pdistance, %) in MEGA X, under the K2P model.Intraspecific max p-distances were calculated for species (i.e., sequences were grouped according to species names), and intragroup max p-distances were calculated for groups of sequences according to the clusters recovered on the NJ tree, disregarding the names of the sequences.

Results and Discussion
This study investigated patterns of genetic diversity amongst 15 African Saturniidae species in 12 genera to assess the congruence between morphological and molecular identification, in the context of all data available on BOLD Systems for these genera.The new sequences represent only a small fraction (8%) of the Saturniidae species found on the continent.DNA barcodes were generated for 198 specimens for assessment of genetic clustering and estimates of genetic diversity.Intraspecific p-max distances and genetic clustering of sequences on the NJ trees showed that 83% (10/12) of the genera had cases of non-monophyletic species, representing 81% of the total sequence dataset (Table 2).

Bunaea
The NJ tree for the genus Bunaea recovered three main clusters, of which cluster 2 (2.58%) and cluster 3 (1.38%)were monophyletic.Cluster 1 had a high level of incongruity between species names and genetic groups (12 novel B. alcinoe, all Bunaea alcinoe caffraria and five Bunaea alcinoe sequences publicly available) with a max intraspecific pdistance of 1.70% despite different taxonomic designations (Figure 1a).However, the max intraspecific p-distance for B. alcinoe was 5.68%, which is above the range commonly accepted for conspecific individuals.

Vegetia
The clusters on the NJ tree for the genus Vegetia were largely congruent with species, except for the sequence of Vegetia ducalis 4. The intraspecific variation of V. ducalis as a species was 4.08%, suggestive of incorrect identification of Vegetia ducalis 4 (Figure 2b).Based on sequence similarity, Vegetia ducalis 4 could represent a different species; however, there were no other sequences with high similarity to confirm this hypothesis.

Eochroa
Prior to this study, only two sequences were available on BOLD Systems for the genus Eochroa.A total of 10 new sequences were generated for Eochroa trimenii.Both novel and public sequences formed a cluster with intraspecific p-max = 2.58%, supporting the morphological identification of the new specimens.

Epiphora
The NJ tree of the genus Epiphora included 303 sequences in 39 species and recovered 21 clusters of which 12 were monophyletic.The new E. bauhiniae sequences clustered with mixed sequences from other seven species with an intraspecific distance of 2.50%, suggesting conspecificity.Groups of sequences with identical names had pdistances ranging between 3.42% -7.97%, suggesting that some specimens may have been misidentified.

Gonimbrasia
The genetic clustering analysis of the genus Gonimbrasia included 539 sequences available on BOLD System distributed amongst 57 species and 13 new sequences for Gonimbrasia tyrrhea.A complex tree topology recovered with 17 monophyletic species, including G. tyrrhea (max p-distance = 0.15%) The remaining 40 species were distributed across non-monophyletic clusters.

Gynanisa
The genus Gynanisa had nine clusters and a high level of incongruity between the names of the sequences and the clusters on the NJ tree.Intraspecific distances of sequences from BOLD Systems designated as conspecific were high, and ranged between 3.20% and 5.90%.The new sequences of G. maja had an intraspecific p-distance of 3.04%, formed two distinct groups, clustering with G. maja, G. nigra, and G. ata.

Holocerina
The NJ tree for the genus Holocerina showed 15 clusters of which 10 were monophyletic with conspecific individuals and intraspecific distances below 3.00%.The new H. smilax sequence clustered with some of the other H. smilax with an intraspecific distance of 1.51% supporting conspecificity of specimens.

Ludia
The genus Ludia showed a total of 12 clusters with a high level of incongruity between species and genetic clusters.The new sequence for L. delegorguei clustered with sequences of L. delegorguei and L. goniata with intraspecific distance of 2.01%.

Nudaurelia
The NJ tree for Nudaurelia consisted entirely of novel sequences for N. cytherea and N. wahlbergi, and both species had low intraspecific max p-distances (0.00% and 0.47%, respectively) in congruency with the morphological identification of the specimens.

Heniocha
The NJ tree for the genus Heniocha showed seven clusters of which three were monophyletic.The intraspecific divergence among H. dyops from BOLD Systems and all novel sequences of H. dyops was 3.71%.When H. dyops was divided into two groups, the intragroup divergences were low: novel sequences of H. dyops-1.76%, and BOLD Systems H. dyops-2.87%,suggesting that the H. dyops dataset may represent two species or two diverged populations of the same species.The new H. apollonia sequence clustered with other sequences of H. apollonia with an intraspecific distance of 2.17% supporting conspecificity.

Pselaphelia
The NJ tree of the genus Pselaphelia recovered 10 clusters, of which seven clusters were monophyletic and comprised of conspecific individuals.The new P. flavivitta sequence clustered with other P. flavivitta with intraspecific distance of 1.51% supporting conspecificity.

Usta
The NJ tree for the genus Usta recovered 10 clusters, of which seven were monophyletic.The remaining three had a high level of incongruity between species names and genetic clusters; however, the intragroup distances within each of the three clusters were below 3.00% suggesting that each cluster is composed of conspecific individuals.The new sequences of U. terpsichore clustered with sequences of U. terpsichore, U. subangulata, and U. angulata with an intraspecific distance of 2.10% suggesting that this cluster represents a single species.A large proportion of the dataset showed the similar patterns of paraphyly.In these cases, more than one species clustered together on the NJ tree with intraspecific p-max distances less than 3.00% (e.g., Figure 2a and 2b).Most of the sequences on the NJ trees belonged to clusters that were not congruent with their species designations, and genetic divergence calculated based on groups of sequences with the same name were higher (>3.00%) than the range expected among conspecific individuals.
These patterns of paraphyly could be explained by misidentification of specimens due to lack of expertise and databasing mistakes.However, it may also indicate that species names fail to represent the genetic boundaries that delimit species into separate evolutionary entities, i.e., the evolutionary nature of species leads to taxonomic uncertainties which are reflected on the high level of paraphyly observed in this study.The observed phenotypic variation used for taxonomic names, despite gene flow, may be related to genetic or ecological selection, which makes it difficult to distinguish and classify specimens.

Conclusion
The reliability of DNA barcodes for identifying unknown specimens by comparison with publicly available sequences depends on the quality of the data that constitutes the reference library.Our detailed analyses of patterns of genetic diversity in 12 genera of African Saturniidae suggest that a large proportion of the data presently available on BOLD Systems is not in agreement with the notion that conspecific individuals have maximum p-distance < 3%.Therefore, we recommend caution when utilizing the available sequences to infer species identification unknown specimens based on the current publicly available dataset.

Publisher' s
Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.Copyright: © 2021 by the authors.Submitted for possible open access publication under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses /by/4.0/).

Table 2 .
Incidence of species-level non-monophyly (NM) across 12 genera of African Saturniidae.NM was calculated as the proportion of non-monophyletic species in each genus.