In nature, protein chain interactions (Pro-ChInt) of single- / multi-protein, a common but complex system, refer to physical contacts established between two or more protein chains depending on the amino acid sequences, which contains tremendous information. Encoding amino acid sequence information of protein using complex networks or graphs of the peptides is a grateful solution to discover the communication information between different Pro-ChInt. We first constructed some python codes to directly download the specify protein sequences from the RCSB protein data bank (PDB). Then, we changed the FASTA format to S2SNet format to calculate the embedded / non-embedded parameters of protein chains according to the star graph topological indices of peptide sequences. Meanwhile, we numbered all protein chains, then used the chain numbers to get a random number for a given set of chain number or case number used for each protein. Then, we replaced all the random numbers with the corresponding parameters of each protein chain calculated with S2SNet application. After that, a machine learning classification model was constructed based on the combinatorial / combining interaction of different chains. This new method can be used to identify two or more protein chain interactions combined with machine learning technique.
I would like to take this message to ask you about the machine learning methods that you used to classify (explain) the protein chain – chain interaction. What kind the machine learning methods that do you use in this kind of problem and if have you got some significant results.
Thanks
Best regards,
Marcus Tullius Scotti
Thank very much for your interesting in our work, the idea of this work is that we are focusing on how to identify and discover the interactions between the different chains of proteins. We are programming some useful codes in python to change the sequences of proteins to Star Graph (SG), and then to describe the performance of the SG of each protein chain in a numerical format (Different types of parameters). Based on these molecular information, we also calculate the entire performance of different protein-chain combinations by calculating the sum or average of each combination. If the combinations of different protein chains accompany with the specific function set as positive group, otherwise.
After that, we use different machine learning methods to select the high performance classification models. The machine learning methods used in present work including Support Vector Machines, Multilayer Perceptrons, KStar, JRip, NaiveBayes, Random Forest, Random Tree, etc.
Best wishes,
Yong Liu