Please login first
Prediction of mRNA expression in cow’s milk using mRNA secondary structures and Machine Learning classifiers
* 1 , 1, 2 , 1 , 1 , 2, 3 , 2, 3 , 4 , * 1
1  Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071 A Coruña, Spain
2  Key Laboratory for Agro-Ecological Processes in Subtropical Region, Hunan Research Center of Livestock and Poultry Sciences, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, Institute of Subtropical A
3  Hunan Co-Innovation Center of Animal Production Safety, CICAPS, Changsha, Hunan 410128, P.R. China
4  College of Life Science and Environmental Resource, Yichun University, Jiangxi Yichun, 336000, China


The mRNA molecules expressed in cow’s milk are important molecular biomarkers for different physiological and pathological conditions in cattle. The prediction of the quantity that a specific mRNA type could be expressed in cow’s milk is a challenging theoretical task. The current study presents for the first time several different Machine Learning models to predict the mRNA expression using the mRNA secondary structure fragments. This unique methodology is based on a dataset of experimental mRNA expression data. Each mRNA molecule has a specific secondary structure represented as a string that can be used to read all the possible mRNA secondary structure fragments. This information is used as input for the Machine Learning methods from Weka software in order to obtain classification models that can predict low, medium and high expression of new mRNA types in the cow’s milk. The mRNA expression levels have been measured with High Throughput Screening techniques. The initial features included the counting of the mRNA secondary structure fragments for each expressed mRNA. The model features were transformed in frequencies and the expression levels were converted into low and high classes. In order to reduce the high number of possible features, a feature selection method has been applied. Thus, the best classification model was obtained with BayesNet method and is based on 24 features and 4067 cases. The model has the true positive rate for the low mRNA expression class of 0.78 (average true positive rate of 0.66). Further studies are needed improve the current results, using datasets with different feature sets and more advanced Machine Learning methods.

Keywords: mRNA secondary structures, Machine Learning classifiers, mRNA expression