CORAL: The Dispersion of SWNTs in Different Organic Solvents

Single-walled carbon nanotubes (SWNTs) are group of new substances with specific cylindrical architecture of their molecules. The dispersion of SWNTs in different organic solvents is parameter that can be valuable information for development of nanomaterials. The CORAL software is a tool to build up model for different endpoints using the Monte Carlo technique. In this work, the ability of the CORAL software to be a tool to predict dispersion of SWCTs in different organic solvents demonstrated.


Introduction
The development of nanotechnology indicates that use of carbon nanotubes (CNTs), in general, and single-walled nanotubes (SWNTs), in particular, gives attractive possibilities for chemical technology [1], biochemistry [2], and medicine [3].The dispersibility of SWNTs in various solvents is important physicochemical characteristics [4] from point of view of technology [5,6].
In particular, this work dedicated to search for a new alternative approaches to predict the dispersibility of SWNTs in organic solvents using the Monte Carlo method [11,12].

Data
The dispersibility of SWNTs in a series of 29 different organic solvents taken in the literature [5,6].The endpoint is decimal logarithm of dispersibility Cmax expressed in mg/mL.Three random splits into the visible SciForum http://sciforum.net/conference/mol2net-1training set (in fact this is structured into two sets: the training and calibration sets) and the invisible validation set are examined in order to check up the actual ability of the approach.

Optimal descriptors
The optimal descriptor used in this work calculated as the following: In Eq. 1: The T* is the coefficient to classify vertex degree into two categories rare and not rare.The parameter has influence upon the results of the Monte Carlo optimization The Vk is vertex in the hydrogensuppressed molecular graph [13][14][15].Table 1 contains example of the hydrogen suppressed graph together with (0, 1) adjacency matrix and Vk values, which are calculated using the elements of the matrix; the CW(Vk) is correlation weight of the Vk.The T* is threshold or a coefficient for the classification of vertices into two classes: (i) rare (the number of Vk in the training set is less than T*) and (ii) active (the number of Vk in the training set is larger than T*).The rare vertices are not involved building up model: their correlation weights fixed equal to zero.The N* is the number of epochs of the Monte Carlo optimization.In fact, one can use arbitrary T and N, but the T* and N* are values of these parameters which give preferable statistical quality of the model for the calibration set, hoping that the model is avoided of the overtraining (i.e. the situation where the excellent quality for the training set accompanied by poor quality for the calibration set).
[Table The predictive potential of the model calculated with Eq. 2 should be checked with data on the calibration and validation sets.

Mechanistic interpretation
The CORAL models give the possibility to interpret the role of different molecular features as the promoters of increase or decrease of an endpoint.For instance, if in several runs of the Monte Carlo optimization the correlation weight of the Vk is larger than zero, then this feature is promoter of the endpoint increase, whereas if the correlation weights of the Vk are less than zero in several runs of the optimization then the Vk should be interpreted as promoter of the endpoint decrease.

Domain of applicability
The domain of applicability for the CORAL model defined according to prevalence of different molecular features in the training and the calibration sets: each molecular feature has the statistical defect.The defect is equal to difference between probabilities of the molecular feature in the training set and in the calibration set.
Ideal situation if the difference is zero, however in praxis, this value is not zero.Apparently, the preferable distribution should be characterized by the minimal sum of these parameters for all active molecular features.Thus, the approach gives possibility not only to define the domain of applicability, but, also, to compare different distributions into the training and calibration sets.

Models
The models for dispersibility of SWNTs in different organic solvents for three different random splits into the training, calibration, and validation sets are the following: Table 2 contains numerical data on the correlation weights used to calculate the DCW(T*,N*) for calculation with Eqs.3-5.Table 3 contains the statistical characteristics of models calculated with Eqs.3-5.

Domain of applicability
The estimation of the domain has been done by scheme described in the literature [16]: the solvent with sum of defects for the SMILES less than average value of this parameter (for the training set) multiplied by 2: [Table 4, around here] One can see (Table 4) the distribution into the training, calibration, and validation sets has influence upon the domain of applicability, but this situation gives possibility to select preferable from the statistical point of view the distribution (minimum of the above-mentioned defect).

Mechanistic interpretation
Three runs of the Monte Carlo optimization with selected T* and N* give correlation weights collected in Table 5.One can hypothesizes about the role of molecular features represented by the Vk in the behavior of a solvent: if all runs give positive value of correlation weight for a Vk then the molecular feature can be classified as promoter of an endpoint increase, if all runs gives negative value of correlation weight then the molecular feature represented by the Vk can be classified as promoter of endpoint decrease [16].

Selection of molecular features for increase (decrease) of dispersibility of SWNT
The analysis of data collected in Table 5 lead to hypothesis that presence (in hydrogen suppressed molecular graph which is representation of a solvent) of carbon and nitrogen atoms with vertex degree 3, oxygen with vertex degree 1, and carbon atom with vertex degree 2 are promoter of dispersibility increase.The presence in molecular graph represented a solvent carbon vertex with vertex degree 1 is promoter of the endpoint decrease.

Comparison with QSAR models from the literature
The statistical characteristics of model of log10Cmax (for validation set, the same 29 solvents) suggested in work [5] are n=6, r 2 =0.932; =0.844, = 0.066; the statistical quality of model (for the same 29 solvents) suggested in work [6] are n=7, r 2 =0.807; =0.744, = 0.125.The abovementioned models related to fixed splits into the http://sciforum.net/conference/mol2net-1training and validation sets, whereas models suggested in this work are checked up with three different splits.It is to be noted, different splits into the training and validation sets used in work [5] and in work [6].

Conclusions
The described version of the Monte Carlo method gives satisfactory prediction for the disprsibility of SWNT in different solvents.The distribution into the visible training set (together with calibration set) and the invisible validation set has influence on the predictive potential models.The approach gives quite convenient measure of quality of distribution into the training and the validation sets together with convenient criterion of the domain of applicability.

Table 1
Example of the hydrogen suppresed graph together with the adjacecncy matrix and vertex degree values (Vk).

Table 2
Correlation weights of different vertices (chemical element together with the vertex degree) calculated by the Monte Carlo method for split 1, 2, and 3

Table 3 .
The statistical characteristics of models for dispersibility of SWNTs in the organic solvents

Table 5 .
Correlation weights of different kinds of the vertex degrees obtained in three runs of the Monte Carlo calculations.
*) N/A = classification is not available