In Silico Analysis of Short Linear Motifs Present in Snake Venom Phospholipases

: Phospholipases A2 (PLA2s) are important constituents of snake venom that, depending on their amino acidic composition, possess several toxic properties whose main ones are neurotox-icity, myotoxicity and impairing of haemostasis. They are proteins of about 120 amino acids, having a structure conserved since basal metazoa, and very similar to that of mammalian secretory PLA2s. Some snake venom PLA2s are heteromultimers, others are monomers or homodimers. In this work we analyze the sequence alignment of monomeric or homodimeric snake venom PLA2s, grouped according to their myotoxic and neurotoxic properties, and we compare this alignment with that of the most similar mammalian secretory PLA2s. We found short linear motifs, present in three regions of secretory PLA2s that can have a role in their toxic and physiological functions. This work suggests important molecular interactions of secretory PLA2s that can focus and shorten the experimental work of characterization of the mechanism of action of these proteins.

More than 400 different snake venom PLA2s toxins sequences are reported in the UniProtKB database, with an overall conserved tertiary structure [4].Given this highly conservation, all of these sPLA2s toxins represent optimal models for the study of sPLA2s function and mechanism of action.Based to the homology to mammalian sPLA2s, snake venom toxins are subdivided in two groups.Toxins derived from Elapidae and Hydrophiidae snakes belong to group I and are more similar to the mammalian pancreatic phospholipases (PLA2G1B), whereas toxins derived from Viperidae and Crotalidae belong to group II, given their homology with the mammalian phospholipase enriched in synovial fluids and tears (PLA2G2A) [5].sPLA2s of both groups share a highly conserved amino acidic sequence and a well-conserved tertiary structure, and although the majority of snake venom PLA2s exist as monomer, some of them acquired a quaternary structure by a non-covalent interaction between two or more PLA2s [1,[5][6][7].PLA2s snake toxins are also subdivided in three different categories based on their toxic effects: myotoxins, targeting skeletal muscles, neurotoxins, that act on pre-or post-synaptic element of neuromuscolar junction and hemostasis-impairing toxins [5,8].Unfurtunately, a correlation between toxic effects of different snake venom PLA2s and their primary structure is still unknown.Moreover, a lot of snake venom PLA2s with high myo/neuro activity are catalitically inactive due to a missense mutation of D49 amino acid (more frequently substituted with a Lysine) [5,9].Therefore, the emerging idea is that the sPLA2s toxic activities depend not only by their enzymatic functions, but also by their interaction with target proteins that may mediate their entry into the cells, activating different signaling cascades, and their intracellular activity.
Here we present an in silico analysis of the primary and 3D structures of snake toxins PLA2s, comparing with their mammalian counterparts, aimed to unveil exposed sites that may represent potential candidates for modification or interaction with other proteins.This analysis was made possible thanks to the "animal toxin annotation project" of the Swiss-Prot database, in which venom protein sequences are systematically curated to the standards of UniProtKB/SwissProt [10].Thanks also to the Eukaryotic Linear Motif (ELM) resource [11,12] it was possible to predict not yet discovered post-translational modifiable sites (i.e, by phosphorylation), or of interaction with other proteins in sPLA2s, helping in the elucidation of the mechanism of action of snake toxins and mammalian sPLA2s.

Sequence Alignment Comparison
To unveil similarities or critical differences between snake PLA2s toxins and mammaliam sPLA2s belonging to group I and II, we have aligned their sequences with Clustal Omega, represented the alignments using the software Snapgene, and compared the alignments in figure 1 A and B. Secretory PLA2 appears to have three regions with higher variability: the region between the first α-helix and the calcium binding loop, the region between the second and the third α-helics ad that comprises the β-sheets, and the C-terminal region.

Amino-acid Composition Analysis of the Central and C-terminal Regions
To unveil amino acids more abundant in the central and the C-terminal regions that may be important for the biological roles of sPLA2s, we have evaluated the mean abundance of different categories of amino acids, taking in consideration particular amino acidic properties.
As shown in figure 2A, the central region of sPLA2s (Group I) neuro-myotoxins and neurotoxins (red and yellow bars) is characterized by a lower mean abundance of phosphorylable Ser/Thr amino acids and of negative charged amino acids (D/E), with respect to mammalian PLA2G1B (green bars).As regards the C-terminal region, it is possible to observe an increased amount of N/Q and a lower amount of charged amino acid (D/E and K/R) in toxins with respect to mammalian PLA2G1 (figure 2 B).
The central region of Group II sPLA2s show a pattern less distinctive for each family, with a mild increment of D/E and of phosphorylable Y in toxins with respect to mammalian PLA2G2A (green bars, figure 2 C).Conversely, more differences could be notice in the C-terminal region of sPLA2s (Group II).In particular, it is evident the total absence of D/E in mammalian PLA2G2A and lower amount of N/Q in toxins and of S/T in myotoxins and neuro-myotoxins (not-D49) (figure 2 D).A and B) and to toxins of group II and mammalian PLA2G2B (panel C and D), respectively.Evaluation of amino acidic composition was performed for each sequence using the ProtParam tool (https://web.expasy.org/protparam/)and then the mean value for each amino acid was calculated and reported in the bar diagram.Amino acids reported in the x axis were separated based on their chemical properties (S,T = phosphorylable sites of ser/thr kinases; F,W: aromatic; D,E: negative charged at neutral pH; K,R: positive charged at neutral pH; N,Q: amine added amino acids; Y: aromatic phosphorylable sites of Tyr kinases; Others: A,G, V, L, I, H, P,C,M).

Slims Conserved in Toxins and not in Mammalian PLA2s or Vice Versa
We searched the ELM database [12] for SLiMs present and conserved in the different groups of sPLA2s.We included in the search the motifs present in any cellular organelles because PLA2s have been shown to be internalized in cells and localized in vesicles, in the cytoplasm, as well as in mitochondria and the nucleus [13][14][15][16][17].Of all the motifs identified, we selected those conserved in toxins but not in mammalian PLA2s, or vice versa.Furthermore, as these motifs can be false positives, to decrease this possibility, we have only considered those present in more flexible and exposed areas of the proteins.The main motifs we identified are reported in figure 3.
Figure 3. Main SLiMs that differentiates snake venom PLA2s from their mammalian homologues.The plasma membrane is represented with a double red line.A) 3D structures of the snake venom neuro-myotoxin notexin, of human PLA2G1B and their superimposition, with in yellow the side chain of S/T residues in the central region.The S/T stretch present in mammalian PLA2G1B contains four superimposed motifs of phosphorylation by GSK3.B) 3D structures of the snake venom myotoxin bothropstoxin-I, of human PLA2G2A and their superimposition.The S/T residue (in yellow), when phosphorylated, allows isomerisation of the adjacent proline by Pin1.The loop following the second α-helix in the toxin structure contains a PKA phosphorylation site and an SH2 binding site, not present in PLA2G2A.

Discussion and Conclusions
The higher differences in the amino acidic composition of the mammalian and snake venom sPLA2s concern the regions that in the 3D structure are on the sides of the proteins, perpendicular to the plasmatic membrane.These regions are enriched in amino acids typical of prion-like domains (N, Y, S, G), and can have or not charged amino acids that can influence the propensity of these proteins to form oligomers on the plasma membrane, and condensates with other proteins such as melittin, HSP70, nucleolin.Such condensates can modulate the enzymatic activity of sPLA2s or regulate the activity of other enzymes [13,18,19] Finally, the main finding of this analysis are that sPLA2s may be subjected to posttranslational modifications that could modulate their function/toxicity.In particular, PLA2G1B has several motifs, not conserved in group I toxins, that can be modified both by GSK3, which generally has an inhibitory function [20,21], and by prolyl isomerase 1 (Pin1), an [ST] phosphorylation regulated prolyl isomerase.Structural modifications by prolyl isomerase Pin1 can remove sPLA2s from the complexes in which they participate and redirect it towards the degradation system as happen for many proteins [22,23].Hence, the toxicity of myotoxins may be (partly) due to the lack of this control by Pin1 so that proline always remains in the trans conformation and this may explain why the removal of proline in bothropstoxin-I greatly reduces its myotoxic activity [24].Moreover, Group II PLA2s snake toxins possess motifs that interact with SH2 domains and PDZ domains.
These interaction domains, and post-translational modification motifs, can have a role in the formation and breakdown of functional molecular condensates on the plasma membrane that can modulate the enzymatic activity of the sPLA2s, and carry out their non-enzymatic activity.Further studies will be necessaries to unravelling all the activities triggered by sPLA2s and all its molecular partners.

Sequence Collection and Alignment
The sequences of the mono or homomeric snake venom phospholipases A2 were collected in the database UniProt using the string (family: "phospholipase a2 family Group I subfamily" fragment:no keyword: "Toxin [KW-0800]" keyword: "Myotoxin [KW-0959]" AND keyword: "Neurotoxin [KW-0528]" taxonomy: "Serpentes (snakes) [8570]" not (annotation:(type:subunit heterodimer) OR annotation:(type:subunit heterotrimer) OR annotation:(type:subunit heterohexamer)), substituting 'Group I' with 'Group II' to search phospholipases A2 of the second Group, and leaving or removing the keywords indicating the type of toxins (myo-or neuro-) to collect the proteins based on their site of action.Only curated entries were considered.To distinguish the catalytically active subfamily the string (family: "D49 sub-subfamily") was included with the conjunctions AND or not.The sequences of mammalian phospholipases were collected in the database UniProt by entering the gene names PLA2G1B and PLA2G2A, considering only the ten curated entry for PLA2G1B, and the five curated entry plus five other entries in the case of PLA2G2A.The sequence alignment was performed with Clustal Omega and visualized with the SnapGene viewer software (version 4.3.11).Pre-and pro-peptides, when present, were removed from the sequences.

Amino-acidic Composition Analysis of The Β-Sheet Containing Region and C-terminal Stretch
The amino-acidic composition analysis of the central region (located between the second and third α-helices and including the two β-sheet secondary structures) and of the Cterminal stretch (next to the third α-helix) of PLA2s proteins was performed using the PROTParam tool of Expasy (https://web.expasy.org/protparam/).Amino-acids were Grouped considering their chemical and biochemical properties as follows: S, T; F,W; D,E;  K,R; N,Q; Y and G,A,V,M,L,I,C,H (called as others).Thus, the frequency of each amino acidic category in the sequence analysed was obtained by dividing its abundance to the total length of the analysed sequence and expressing it as a percentage value.Finally, the mean value of the frequency of each amino acidic category was calculated for each family of PLA2A toxins and their mammalian counterparts.

Short Linear Motifs Identification
Short linear motifs (SLIMs) were identified using the 'The Eukaryotic Linear Motif resource for Functional Sites in Proteins' (http://elm.eu.org/) [12], considering all cell compartments.Only motifs conserved in toxins and not in mammalian phospholipases A2 or vice versa were considered.The syntax used to express amino acid motifs is that adopted by the ELN database and described at the page http://elm.eu.org/infos/help.html.

Figure 1 .
Figure 1.Consensus sequences analyses of snake venom PLA2s and their mammalian homologs.Panel A-Consensus sequences of Group I snake venom neuro-myotoxic and neurotoxic PLA2s compared with that of mammalian PLA2G1B.Panel B-Consensus sequences of Group II snake venom myotoxic, neuro-myotoxic and neurotoxic PLA2s compared with that of mammalian PLA2G2A.Myotoxins and neuro-myotoxins are divided according to the amino acid present in the position known as '49' of the active site.The sequences were collected from the Swiss-Prot database and aligned with the align tool of UniProt (Clustal O, https://www.uniprot.org/align/).The alignments were then visualized with SnapGene

Figure 2 .
Figure 2. Mean amino acidic composition of the two low complexity regions of sPLA2s.Analyses of the mean amino acidic composition of the central regions (55-85) (panel A and C) and of the C-terminal stretches (from 103) (panel B and D) belonging to toxins of group I and mammalian PLA2G1B (panel A and B) and to toxins of group II and mammalian PLA2G2B (panel C and D), respectively.Evaluation of amino acidic composition was performed for each sequence using the ProtParam tool (https://web.expasy.org/protparam/)and then the mean value for each amino acid was calculated and reported in the bar diagram.Amino acids reported in the x axis were separated based on their chemical properties (S,T =