DNA barcode of Eurycoma longifolia Jack (Simaroubaceae) from Sumatra, Indonesia based on trn L-F plastid sequence

: Eurycoma longifolia (Simaroubaceae) is a popular medicinal plant from the South-East Asian rainforest that often used as an aphrodisiac and anti malaria. The main supplies of


Introduction
Eurycoma longifolia Jack is one of the most well known herbal medicinal plants in South-East Asian, popularly recognized as 'Tongkat Ali' or 'Pasak Bumi'.The species is distributed in South Myanmar, Vietnam, Laos, Cambodia, Thailand, Peninsular Malaysia, Singapore, Sumatra, Borneo dan the Philippines [11], and may occured in Bangladesh based on herbarium records of GBIF (Figure 1) [12].The roots of this plant contained several active compounds of canthin, b-carboline alkaloids, derivate of squalene tirucallane triterpenes, biphenylneolignans and quassinoids [1] that has an antimicrobial [2], antimalarial [3], antidiabetic, antiulcer and anticancer [4][5], but mainly used as an aphrodisiac [6][7].Currently, some of the products derived from this plants are found market in the form of raw materials or packaged herbal products such as coffee, tea, capsules and tablets [8] as well as sweets [9].This species is widely traded both domestic and internationally but not included within the CITES appendices [13].In combination of the increasing market demand, a high price in the international market but low price from the farmer,make this species has been targeted for illegal exports in Indonesia.In addition to this, several industries herbal medicine prone to counterfeiting substitution, contamination, use of fillers [10].Meanwhile several industries herbal medicine using this plants are prone to counterfeiting substitution, contamination, use of fillers [10].The extraction of plants are usually taken from the wild by destructive harvesting of root pulling.If this continue to happen, the sustainability use the species cannot be warranty.In anticipating this, it is important to develop a system that can be used for on tracking and treacibility of the source plants.One of the approaches is to develp a DNA barcode as a specific identity for E. longifolia from Sumatra.
DNA barcode is a standard tool for species identification by using a fragment of DNA sequence of certain genes/regions [14][15].The use of term DNA barcode was first raised in 2003 [16] and has gained worldwide attention in the scientific community since then [17].Recently, the use of DNA barcode cover wide range of studies from the discovery a new species, even discriminating cryptic species, population diversity, food safety and conservation [18][19][20]..The CBOL proposed portions of two coding regions from the plastid (chloroplast) genome-rbcL and matK-as a "core barcode" for plants, to be supplemented with additional regions as required [20].In this study we used trnL-F as an marker of choice for DNA barcode for E. longifolia.The trnL-F region is widely and often used for the study of molecular systematics due to its sufficient mutations to detect variations at specific and infra specific levels [21].This region was one of the recommended regions for DNA barcoding analysis [21][22] as its meet all requirements identified by the CBOL of being suitable for DNA barcoding.Those loci were routinely retrievable with single primer pair, easy to obtain bidirectional sequence reads, and provided maximal discrimination among plant species [14].
This present study was aimed to develop a DNA barcode for E. longifolia from Sumatra using the trnL-F region.We expect to discover nucleotide variations that some of which were specific for samples from Sumatra.The results from this study is expected to assist identification of herbal medicine containing E. longifolia from Sumatra, Indonesia.

Samples
Twenty-four leaves samples were collected from Sumatra mainland and Riau islands (Table 1) dried in silica gel for further use.

Isolation of DNA, Amplification and sequencing of trnL-F
The total genomic DNA was isolated using Genomic DNA Mini Kit (Plant) from GeneAid following the manunfacturer's protocol.
Amplification of trnL-F region by the PCR technique was using a universal pair primer of 'c' for forward 5' (CGAAATCGGTAGACGCTACG) and 'f' (ATTTGAACTGGTGACACGAG) [26].A PCR mixture of a total volume of 12.5 µ L consisted of 5 pmole each of forward and reverse primer and 10 ng/µ L of DNA template.The PCR reaction was performed in a Takara with the optimum condition of the following: a pre-denaturation at 94°C for 2 min, 30 cycles composed a denaturation at 94°C for 30 s, annealing at 52°C for 3 s, extension at 72°C for 1 min, and a final extension at 72°C for 10 min.The reaction was repeated in 40 cycles.The amplified bands were visualized on 1.5% agarose stained with GelRad.Electrophoresis was executed with 50 volts for 60 min in 1x TBE buffer.The target trnL-F bands were visualised under the UV light using AttoBioinstrument.The amplicons were then sequenced using Sanger sequencing at the First Base company.

Data analysis
The trnL-F sequence results were assembled using contig editor on ATGC software package version 4.3.5.(Genetyx Co. , Japan).The forward and reverse of each sequence were observed carefully to ensure there was no mismatch on consensus sequence produced.The nucleotide composition of the trnL-F gene were evaluated using MEGA 7.0 software [27].Samples were examined their homology and identity by using BLAST nucleotide on GeneBank (https://blast.ncbi.nlm.nih.gov/Blast.cgi).Data from the GenBank were downloaded in FASTA format form.Both data from this research and from GenBank were processed and put together using MEGA software [27].The data obtained were aligned using Muscle in MEGA7 [27].The genetic distance estimation data was analyzed using Pairwise Distance with the Kimura 2-parameter model [28] also in MEGA7.The phylogenetic tree was reconstructed using the Maximum-Likelihood method with 1000 bootstrap replicates [29] with the Kimura 2-Parameter (K2P) model in the MEGA7.

The sequence homology and identity of E. longifolia from Sumatra
The amplicon size of the trnL-F chloroplast gene from the 24 samples from this study was 961 bp, consisted of 469 bp trnL (its exons and majority of intron) and 492 trnF gene and the intergenic spacer between trnL and trnF.Of ths 469 trnL region, 417 bp were homologue to many sequences from the genbank, 18 of which were having similarity of more than 99% (data not shown).Eight reference species were further used to built phylogenetic tree based on distance analyses (Figure 2) to assess the phylogenetic position of E. longifolia samples from Sumatra, thus determining its identity.One of the sequence reference was derived from a complete chloroplast genome sequence of E. longifolia (GenBank accession MH751519) origanated form Kuala Lumpur Malaysia [22].When performing the blast nucleotide search, the query used was accession P1EL and it was 100% similar to this reference sequence, this indicate the query (P1EL) has the same haplotype as the reference (MH751519).Meanwhile the samples of E. longifolia have more than 99% similarity to the eight reference taxa (Table 2).
When a phylogenetic distance-tree based analysis was performed using sequence query of P1EL (Table 1), it formed a group together with the other reference E. longifolia (Figure 2).Thus, this confirmed the identity of the P1EL representing all the samples used in this study.
The total single-base substitution (point mutations) found in of the 961 bp the trnL-F region observed in Sumatra samples were five, located at position 52, 55, 135, 421, and 742, while mutation at position 161 and 371 were found only in reference taxa (Table 3).There was also one indels event of one to two A repeat, but this was excluded from the discussion due to its ambiguity.The shared nucleotides among the samples used in this study was started from position 53 to 469 containing trnL intron region.Thus the first and the last point mutations (at position 52 and 742) were inclusively refer to samples from Sumatra.The first base mutation was observed in position 52, a transisition of A --> G, found mainly in samples from the North Sumatra, few from West Sumatra and one from Riau.The second was, a transversion from C --> T found in samples from Sanglap Riau in position 55.The third point mutation was another transition from T--> C observed in three populations from Riau (AS, R and ES) at position 135.The fourth was an transversion of G --> T in position 161 that seemed to be an autapomorph nucleotide for Simaba monophylla.Another transition of C --> T was found in position 371 belonging to non-genus Eurycoma.Another transversion from G --> T found in position 421 recorded from five samples (06EL to 10EL, Table 3) from the West Sumatra.The last one transversion from C--> G was observed in the intergenic spacer between trnL and trnF gene (position 742), and this mutation were observed in the same five samples from the West Sumatra.

Phylogenetic tree reconstruction
The results of the phylogeny tree reconstruction using Maximum Likelihood showed unresolved topology with relatively low bootstrap support (BS) (Figure 2).Five species of non-Eurycoma (A) are separated into different lineage with 66% BS.The remaining samples of E. longifolia samples from Sumatra as well as reference E. longifolia have unresolved position on the topology except for samples from Riau (B) and West Sumatra (C) supported by 64% and 61% bootstrap value respectively.A molecular phylogenetic tree by Maximum Likelihood method using 32 trnL intron sequences.Branch supports was inferred using 1000 bootstrap replicates.The tree is drawn to scale, with branch lengths measured in the number of substitutions per site.

Discussion
The partial sequence of the trnL-F region is an A/T rich region, particularly from the trnL intron, resulted in this study, has confirmed the identity of the samples as E. longifolia.The complete cp genome that has been produced recently has helped this confirmation as the reference sequence originated from Kuala Lumpur Malaysia has the identical trnL intron as query sample derived from the North Sumatra.The relatively closer geographic distance between Kuala Lumpur and North Sumatra may have been in coincidence with the identical trnL intron sequence.In addition, the nucleotide from the North Sumatra did not possessed unique substition, hence the sequence composition have more in common than population from the West Sumatra and Riau.This idea was confirmed by the unresolved phylogenetic position of the North Sumatran samples and reference sample from Kuala Lumpur.Examination of nucleotide variation has revealed five single base substitutions possessed by the samples from Sumatra and two may be unique to Riau.Even though this deduction was derived only one region (trnL-F), the pattern seemed to be consistent in 24 samples, particularly referring to the point mutations recorded only from Riau.

Conclusions
In a total of 961 bp trnL-F sequence used in this study, we discovered five point mutations specifically possessed by samples of E. longifolia from the West Sumatra dan Riau.Of the five nucleotides, four were found in the trnL intron and one in the intergenic spacer between trnL and trnF gene.Two point mutations were observed from some samples from West Sumatra and all samples from Riau respectively.Thus, the Riau clades are the only groups of E. longifolia that were resolved in the phylogenetic analysis.It is suggested that the trnL intron can be used as one of potential markers for establishing DNA barcode for E. longifolia from Indonesia.It is recommended to use more DNA barcode markers with a similar mutation rates as trnL intron to complement this present results.
Author Contributions: All authors contributed equally in the preparation of research and the manuscript.

Figure 1 .
Figure 1.A phylogenetic tree distance based on Fast minumum evolution with max sequence difference set as 0.75 showing that the query sequence (highligted in yellow) was identified as E. longifolia as it was located at the same group as the other E. longifolia accessions.

Figure 2 .
Figure 2. A molecular phylogenetic tree by Maximum Likelihood method using 32 trnL intron sequences.Branch supports was inferred using 1000 bootstrap replicates.The tree is drawn to scale, with branch lengths measured in the number of substitutions per site.

Table 1 .
Sources of samples of E. longifolia from Sumatra Indonesia and reference taxa used for the phylogenetic analysis.

Table 2 .
The results of homology search of the trnL-F sequence.

Table 2 .
Nucleotide composition and variation found in the trnL-F sequence of Eurycoma longifolia from and the reference accessions.Note: *: inferred from 417 bp of trnL intron region; **: inferred from 960 bp complete trnL-F region, the aligned trnL intron share with all samples started at position 53.