Pro-ChInt: Machine Learning Methods for Identifying Dual- / Multi- Protein Chain Interactions with Python

Yong Liu

doi:10.3390/MOL2NET-1-F015

Previous Article in event

Editorial: MOL2NET 2015, International Conference on Multidisciplinary Sciences.

Previous Article in congress

Performance of the NOF theory in the description of the four-electron harmonium atom in the singlet state

Next Article in event

Synthesis and Platinum (II) Complexes of Different Polyazacyclophane Receptors

Next Article in congress

Molecular Rearrangement of an Aza-Scorpiand Macrocycle Induced by pH. A Computational Study

Pro-ChInt: Machine Learning Methods for Identifying Dual- / Multi- Protein Chain Interactions with Python

Yong Liu

¹ Key Laboratory of Subtropical Agro-ecological Engineering, Institute of Subtropical Agriculture, the Chinese Academy of Sciences, Changsha, Hunan, 410125, P. R. China
² Computer Science Faculty, University of A Coruña, Campus de Elviña s/n, A Coruña, 15071, Spain

Published: 07 December 2015 by MDPI in MOL2NET'15, Conference on Molecular, Biomed., Comput. & Network Science and Engineering, 1st ed. congress CHEMBIO.INFO-01: Cheminfo., Chemom., Comput. Quantum Chem. & Bioinfo. Congress, Cambridge, UK-Chapel Hill and Richmond, USA, 2015

https://doi.org/10.3390/MOL2NET-1-F015

Abstract:

In nature, protein chain interactions (Pro-ChInt) of single- / multi-protein, a common but complex system, refer to physical contacts established between two or more protein chains depending on the amino acid sequences, which contains tremendous information. Encoding amino acid sequence information of protein using complex networks or graphs of the peptides is a grateful solution to discover the communication information between different Pro-ChInt. We first constructed some python codes to directly download the specify protein sequences from the RCSB protein data bank (PDB). Then, we changed the FASTA format to S2SNet format to calculate the embedded / non-embedded parameters of protein chains according to the star graph topological indices of peptide sequences. Meanwhile, we numbered all protein chains, then used the chain numbers to get a random number for a given set of chain number or case number used for each protein. Then, we replaced all the random numbers with the corresponding parameters of each protein chain calculated with S2SNet application. After that, a machine learning classification model was constructed based on the combinatorial / combining interaction of different chains. This new method can be used to identify two or more protein chain interactions combined with machine learning technique.

View paper

63 Reads

Comments on this paper

Marcus Scotti

9 December 2015

Interesting codes in python to aid protein chain interactions analysis.

Dear Yong Liu, I read your abstract and I think that the scripts (codes), used in the methodology reported will aid to perform faster analysis of protein chain interactions using descriptors that are generated using simple information regarding the peptide sequence. I would congratulate you for this nice work.

I would like to take this message to ask you about the machine learning methods that you used to classify (explain) the protein chain – chain interaction. What kind the machine learning methods that do you use in this kind of problem and if have you got some significant results.

Thanks

Best regards,

Marcus Tullius Scotti

Yong Liu

15 December 2015

Dear Marcus Tullius Scotti,
Thank very much for your interesting in our work, the idea of this work is that we are focusing on how to identify and discover the interactions between the different chains of proteins. We are programming some useful codes in python to change the sequences of proteins to Star Graph (SG), and then to describe the performance of the SG of each protein chain in a numerical format (Different types of parameters). Based on these molecular information, we also calculate the entire performance of different protein-chain combinations by calculating the sum or average of each combination. If the combinations of different protein chains accompany with the specific function set as positive group, otherwise.
After that, we use different machine learning methods to select the high performance classification models. The machine learning methods used in present work including Support Vector Machines, Multilayer Perceptrons, KStar, JRip, NaiveBayes, Random Forest, Random Tree, etc.

Best wishes,
Yong Liu

Yong Liu