– Entropy LOCAL EFFECT OF ASYMMETRY DEVIATIONS FROM GAUSSIANITY USING INFORMATION-BASED MEASURES

In this paper local sensitivity measures are proposed to evaluate deviations from multivariate Gaussianity caused by asymmetry; the model we use to regulate asymmetry is the multivariate skew-normal distribution because it reflects the deviation in a very tractable way. The paper also examines the connection between local sensitivity and Mardia’s and Malkovich-Afifi’s skewness indices. Once the local sensitivity measures have been introduced, we study the effect of local perturbations in asymmetry on the conditional distributions; this issue has important implications because there are many procedures in statistics and other fields where the output depends on the distribution of some variables for known values of the others. The proposed measures use the Kullback-Leibler divergence to evaluate dissimilarities between probability distributions in order to assess deviation from Gaussianity on the joint distribution and on the marginal and conditional distributions as well. The results are illustrated with some examples.


Introduction
Because non-Gaussian data are common in many applications, a natural question that arises is whether the usual methods can be applied when the probability models do not satisfy the Gaussianity assumption.We study this issue, present mathematical results, and discuss when and how, in spite of the asymmetry effect, they may be accomplished.The issue we consider here is the effect of deviated joint distributions on the corresponding marginal and conditional ones for a partition multivariate vector of the variables.Many procedures in statistics deal with a Gaussian random vector Z divided in two blocks with different roles.One of these techniques for modeling continuous multivariate data is Gaussian Bayesian networks (GBN).They represent the dependence structure of a set of random variables with directed acyclic graphs (DAG), where the presence or absence of edges between the nodes with variables represent conditional dependence or independence of the variables in the model [? ?].For inferential purposes, the network variables are partitioned into two categories: interest and evidence variables, which are also known as non-observable and observable variables and the output is given by the conditional distribution of the interest variables for some known values of the evidential variables.This inference procedure, termed evidence propagation, is computationally tractable under Gaussianity and linearity assumptions and the exact conditional distributions are available.However, Gaussian deviations introduce both computation and interpretation problems.Therefore, a key point is to know if the output of our model is robust to slight perturbations because it would allow the use of the well behaved methods of the Gaussian distribution when the actual model is not exactly as assumed.This problem has been studied for kurtosis perturbations using the multivariate power exponential distribution as the underlying model for handling kurtosis [1].In this work we focus on asymmetry deviation from Gaussianity so the multivariate skew-normal (SN) distribution [2,3] is used as the alternative model.With this aim, we calculate the relative conditional sensitivity measure to evaluate the effect of the asymmetry on the marginal and conditional distributions.The tool to evaluate deviations from Gaussianity is the Kullback-Leibler (KL) divergence measure.Although it is a non-symmetric measure, it gives a good representation of deviations between Gaussian distributions, which involves tractable calculations; moreover, some recent studies also present fine results for SN distributions [4].As an example we apply our measures to permutation-symmetric Gaussian variables, illustrated by a complete DAG with a common correlation coefficient; some available results for this case provide revealing findings when the proposed measures are used.
The paper is organized as follows: in the next section we review the fundamentals of the multivariate SN distribution.In Section 3 we propose the new local sensitivity measure and study its connection with Mardia's and Malkovich-Afifi's skewness indices; we also adapt the local measure to derive a local relative sensitivity index.Section 4 is concerned with the evaluation of the effect of asymmetry perturbations on the conditional distributions and a relative conditional sensitivity measure is defined as an essential tool to deal with deviation from Gaussianity characterized by skewness.We finish the paper with a summary of the main findings and some practical implications regarding the proposed measures.

The multivariate skew-normal distribution
The multivariate skew-normal distribution was introduced by [2] in order to model departures from multivariate Gaussianity which are related to asymmetry.From the very beginning the SN multivariate distribution has been widely accepted as a modeling tool in scenarios where the Gaussianity assumption on the stochastic mechanism that generates the data fails.Its tractability and appealing properties have enhanced the usefulness of the SN distribution both in theoretical and applied multivariate analysis.Some works discussing its main properties with an emphasis on applications are Azzalini and Capitanio [35], Capitanio et al. [6], Azzalini [7], Loperfido [8], just to name a few.
In this paper we adopt the notation of the aforementioned works to define the density function of a p-dimensional SN random vector with location vector ξ = (ξ 1 , . . ., ξ p ) and scale matrix Ω as follows: where φ p (•; Ω) denotes the p-dimensional Gaussian density with zero mean and full rank covariance matrix Ω, Φ is the distribution function of a standard Gaussian random variable, ω = diag(ω 1 , . . ., ω p ) is a scale diagonal matrix with non negative entries such that Ω = ω −1 Ωω −1 is a correlation matrix, and α is a p-dimensional shape vector regulating the skewness of the distribution.We will write Z ∼ SN p (ξ, Ω, α) to denote that Z follows a p-dimensional SN distribution with density function (1).When α = 0 the multivariate SN becomes a p-dimensional Gaussian variate with mean vector ξ and covariance matrix Ω; so we can put Z ∼ SN p (ξ, Ω, 0) or equivalently Z ∼ N p (ξ, Ω).
On the other hand, we can also observe that Z = ξ + ωZ 0 , where Z 0 is a multivariate skew-normal variable with density function given by We refer to Z 0 as the " normalized " multivariate SN variate [3].An appealing feature of the SN distribution with density (1) is that classical non Gaussianity indices like Mardia's measures of skewness and kurtosis [9] and Malkovich-Afifi's skewness and kurtosis [10] admit a simple closed form.From [5] we know that Mardia's measures of skewness and kurtosis are given by On the other hand, Malkovich-Afifi's indices of skewness and kurtosis, which will be denoted by γ D 1,p and γ D 2,p , are closely related to Mardia's measures; they are given by γ D 1,p = γ M 1,p and γ D 2,p = γ M 2,p − p(p + 2) + 3 [8].
All these measures, along with their omnibus versions, share the common characteristic of being non decreasing functions of the quantity α * = α Ωα; hence, α * can be understood as a scalar that summarizes the departure from Gaussianity in the SN model.This finding has been previously reported by [3] who presented α * as an indicator that encapsulates the non Gaussianity of the multivariate SN distribution.In the next section we provide a deeper insight about how to interpret this quantity, as well as what it measures intrinsically; we will also examine its meaning in connection with the local deviation index we are going to introduce now.

Local deviation and relative local deviation indices
The KL divergence was introduced as a generalization of Shannon's entropy and has been widely used in both statistical inference and information theory.This divergence is a non-symmetric measure that provides global information on the difference between two probability distributions [11].
The KL divergence between two probability densities f and g, over the same domain X , can be defined as follows We can say that ( 5) is the KL divergence between densities f and g or alternative the KL divergence between their corresponding univariate random variables X and Y , so we can also denote it by D KL (X, Y ).The definition can be extended to multivariate density functions in a natural way to deal with the KL divergence of multivariate random vectors.
The main results of this paper rely on the KL divergence between multivariate Gaussian and SN variables.The next proposition provides a result for calculating it.For the sake of simplicity we take ξ = 0. Proposition 1.Let Z and Z α be p-dimensional random vectors such that Z ∼ SN p (0, Ω, 0) and Z α ∼ SN p (0, Ω, α) respectively.Then we obtain that D KL (Z, Z α ) = −E log 2Φ √ α * Z , where α * = α Ωα and Z is a standard Gaussian random variable.
Proof.We proof the assertion of the statement by adapting the argument in [4] to the previous parametrization of the SN model.From expression (1) we obtain that the KL divergence between Z and Z α is given by Taking into account that Z follows a multivariate Gaussian distribution, we obtain that Z = α ω −1 Z is a univariate Gaussian random variable with variance α * , which implies the result of the statement

Local deviation index
The result in Proposition 1 provides quantitative insights on how to assess the effect of small perturbations in asymmetry when we deal with multivariate Gaussian variables; it will be used to define our local deviation measure.First of all, we show how we represent perturbations in asymmetry.
In the multivariate SN model, departures from Gaussianity are regulated by the shape vector parameter α.Hence, the effect of infinitesimal perturbations in asymmetry can be quantified by comparing the distribution of a baseline p-dimensional vector Z such that Z ∼ N p (0, Ω) with the distribution of a perturbed SN vector Z ε such that Z ε ∼ SN p (0, Ω, ε e), where e is a given direction driving the asymmetry and ε is a perturbation factor such that ε ↓ 0. In this framework, it is well-known that the KL divergence, as a function of ε, can be approximated around 0 by a quadratic function of the form aε 2 [12]; it therefore stands to reason to define our index as follows.
Definition 2 (Local deviation index).Let Z be a Gaussian random vector with Z ∼ N p (0, Ω).For every non-null direction e such that e e = 1, let us consider a vector Z ε such that Z ε ∼ SN p (0, Ω, ε e) with ε > 0. Then departures from Gaussianity in asymmetry can be locally quantified by Note that measure (6) depends on the direction of asymmetry e and on the covariance matrix of the underlying multivariate Gaussian model.Using the result of Proposition 1, the limit in ( 6) can be calculated as follows: After some simple calculus with the integral above, we get Furthermore, after successive applications of L'Hopital's rule we obtain that the limit of the integrand is Hence, As we have already mentioned, [3] noted that the quantity e Ωe is functionally related to Mardia's skewness and kurtosis indices in SN models, so it can be considered as a quantifier for regulating departures from Gaussianity.The relation derived in (7) gives another way to interpret this quantity; it turns out to be proportional to the limit rate of divergence from the underlying Gaussian model when slight SN perturbations through the direction e are injected.Local perturbations in asymmetry are described by a dissimilarity measure between distributions through KL divergence, so we argue that the new way of interpreting e Ωe provides an overall picture of what it really measures.

Example 1. Maximum LD for permutation-symmetric normal variables
In this example we are going to consider a vector Z belonging to the family of permutation-symmetric normal variables [13], i.e. random variables having multivariate normal distribution with equal means, equal variances and equal correlation coefficients.Therefore, the covariance matrix is given by with ρ > − 1 p−1 in order to have a positive definite matrix.For the sake of simplicity we can assume that Z has a zero mean vector, σ 2 = 1 and ρ > 0 since the case ρ < 0 can be studied in a similar way using an argument based on the symmetry of the problem.With these assumptions we obtain a correlation matrix Ω = Ω with appealing properties: its largest eigenvalue is λ 1 = 1 + (p − 1)ρ, and the remaining eigenvalues are λ 2 = . . .= λ p = 1 − ρ; therefore, the local deviation index can be bounded as follows where e 1 is the normalized eigenvector e 1 = 1 √ p 1 p×1 corresponding to the largest eigenvalue λ 1 , with 1 p×1 denoting a p-dimensional vector whose components are all equal to one.
Hence, vector e 1 gives the direction yielding the maximum local deviation from Gaussianity.The maximum is a non decreasing function of the dimension p and the correlation coefficient

Relative local deviation index
The result derived in (7) gives a valuation of how the multivariate Gaussian distribution is deformed by slight perturbations in the direction e.The maximum is obtained when moving through the first principal component e 1 of the correlation matrix Ω.This fact leads to the following natural measure for quantifying relative local deviation (RLD) from Gaussianity.
Definition 3 (Relative local deviation index).Let Z be a Gaussian random vector such that Z ∼ N p (0, Ω) with correlation matrix Ω = ω −1 Ωω −1 .For every non-null normalized direction e, the relative local deviation from Gaussianity is measured by where e 1 is the normalized eigenvector corresponding to the largest eigenvalue, λ 1 , of the correlation matrix Ω.When departures from Gaussianity are modeled in accordance to a SN normalized vector Z 0 , with shape vector α such that e 1 is proportional to α, we know that the principal components of Z 0 are independent and proportional to the components of its canonical transformation [8].In addition, the direction yielding the maximal skewness is proportional to e 1 , with all the multivariate skewness being absorbed into this direction, and with the remaining canonical variates having null skewness [8].Thus, the denominator in (8) can be understood as the non Gaussianity index of the only skewed component in the canonical transformation of Z 0 .

Example 2. RLD for bidimensional permutation-symmetric normal vectors
In this example we consider the class of permutation-symmetric normal vectors with p = 2.In this case, the denominator in (8) corresponds to the maximum eigenvalue Ω, which is given by λ 1 = 1 + ρ, provided that ρ > 0. Hence, the RLD measure can be written as a function, RLD(θ), of the angle between the direction e and the positive direction of x-axis, where 0 ≤ θ ≤ π.The black curves in Figure 1 display this function for different non-negative correlation coefficients.As we have already mentioned, the maximum RLD is attained at θ = π 4 with RLD( π 4 ) = 1, while the minimum is attained at θ = 3π 4 with RLD( 3π 4 ) = 1−ρ 1+ρ .Gray plots for the RLD measure at the corresponding negative values of ρ are also included.In this case, by the symmetry of our problem, we obtain that the minimum and maximum are given by RLD( π 4 ) = 1+ρ 1−ρ and RLD( 3π 4 ) = 1 respectively.In Figure 1 we show several plots of the RLD index for different correlation coefficients.Observe that when ρ > 0 the steepest increases of RLD from its minimum are obtained for the larger values of the correlation.On the contrary, when ρ < 0 they occur for the smaller values.Thus, we conclude that the largest ρ 2 will yield the steepest increases in local sensitivity

Application to assess relative conditional sensitivity
In this section we apply the ideas we used to introduce the LD index in order to assess relative sensitivity of conditional distributions.This issue has important implications in modeling continuous multivariate data with GBN.These are graphical displays of the dependence structure of a set of Gaussian variables with DAG, where the presence or absence of edges between the nodes represent conditional dependence or independence of the variables in the model.As the specialized literature establishes, the p-dimensional random vector can be partitioned as (Y, X) with X the set of observable or evidential variables and Y the set of non-observable or interest variables; when some particular values of the evidential variables are known, the conditional distribution Y|X has to be determined in order to update our knowledge about Y.This inference procedure, termed evidence propagation, is computationally tractable under Gaussianity as the exact conditional distributions are available.However, deviations from Gaussianity may cause computational and interpretation problems.Therefore, an open problem is to know if the output of the network is robust to slight perturbations in our model since this robustness would allow using the well-behaved methods under Gaussianity when the actual model is not exactly Gaussian.We propose to utilize multivariate SN distributions for modeling asymmetry deviations.With this aim, we apply measure (7) from Definition 2 to introduce a Relative Conditional Sensitivity Measure (RCSM) that will be used to evaluate the effect of slight perturbations in skewness on the conditional distributions.
Our study of the relative sensitivity of conditional distributions to infinitesimal changes in the symmetry of the joint density relies on KL divergence measure and the well-known chain rule: where f (Y,X) , f Y|X and f X represent the joint, conditional and marginal densities of the reference distribution and g (Y,X) , g Y|X and g X are the corresponding densities of the distribution to be compared.As we are interested in assessing deviations from Gaussianity we take as reference, f , the density of a Gaussian vector and g the density of the perturbed vector.Some studies have assessed the local divergence of conditional distributions for different purposes.In particular, [14] measured the local association of the random variable Y with a set of covariates at X = x using an appropriate limit of the divergence between f Y |X=x and f Y |X=x+ε ; a proposal to use local divergence as a measure of association between random variables was also provided.Conditional divergence measures were also utilized in the context of GBN to evaluate the sensitivity of model output to parameter perturbations [15] as well as to assess the effect of deviations regulated by a kurtosis parameter [1,16].
In view of relationship ( 9), an appropriate measure of relative conditional sensitivity can be defined as follows.
Definition 4 (Relative Conditional Sensitivity Measure).Let (Y, X) be a p-dimensional random vector.The relative conditional sensitivity measure of (Y, X) with reference density function f is given by with f ε denoting the corresponding joint, conditional and marginal densities associated with the perturbation.
Under our assumptions it follows that expression (10) has a simple closed form, as we will show in Proposition 5. Before presenting this result, we introduce some notation.
Let (Y, X) be a p-dimensional random vector following a N p (0, Ω) distribution and let (Y ε , X ε ) be the perturbed random vector following a multivariate SN p (0, Ω, ε e) with e a fixed asymmetry direction, and X and X ε (p − k)-dimensional vectors with k < p.Without loss of generality we can assume e is a normalized vector.The scale matrix Ω, the correlation matrix Ω and the diagonal matrix ω can be partitioned as follows: Similarly, we have the partition e = e Y e X for the asymmetry direction with e ε = ε e being the perturbed direction and ε > 0 an infinitesimal perturbation factor.
Proposition 5. Let (Y, X) be a p-dimensional random vector following a N p (0, Ω) distribution, with Y a k-dimensional random vector, k < p.Let us denote by (Y ε , X ε ) the perturbed random vector which follows a multivariate SN p (0, Ω, ε e).Then the RCSM measure is given by where ΩYY.X = ΩYY − ΩYX Ω−1 XX ΩXY is the Schur complement of the submatrix ΩXX in Ω.
Proof.Since (Y ε , X ε ) follows a multivariate SN p (0, Ω, ε e) distribution, it can be shown that , where the asymmetry direction e ε X(Y) is given by e . Therefore, using a similar argument as in Proposition 1, we have Now, if we consider the scalar variable T ε = e ε X(Y) ω −1 X X then we know that T ε is a scalar Gaussian variable whose variance is given by Therefore, D KL (X, X ε ) can be written as where Z is a standard Gaussian univariate variable.
Then, with similar calculations as in the limit of Proposition 1 we obtain from which we get where φ and φ ε are obviously denoting the density functions of the Gaussian and perturbed vectors in the different ways they are used.Thus, the proposition is proved In this setting it may be worthwhile finding the directions leading to the optimum of the RCSM index, since they would be directions through which the impact of non Gaussianity attains a maximum or a minimum.We can solve this issue by rewriting (11) as a generalized Rayleigh's quotient as follows where is an extended version of ΩYY.X with dimension p × p.
It is well known that the maximum and minimum of the quotient in expression ( 12) is given by the maximum and minimum eigenvalues of the matrix Ω−1 Ω * YY.X .Taking into account the form of the inverse of a matrix partitioned by blocks we obtain that , whose eigenvalues are given by λ 1 = 1 and λ 2 = 0 with multiplicities k and p − k respectively.The eigenspace for the maximum eigenvalue λ 1 = 1 is spanned by the columns of the matrix , which in turn provide directions for which the effect of slight asymmetry perturbations on the conditional distribution attains its maximum.On the other hand, the eigenspace associated with λ 2 = 0 is the (p − k)-dimensional kernel of the transform Ω−1 Ω * YY.X , spanned by the columns of the matrix , which corresponds to the directions of the conditioning variables.We apply our results to the distributions of Example 1.
Example 3. Maximum RCSM for permutation-symmetric normal vectors In this example we study the RCSM index for the class of permutation-symmetric normal variables so we use the same notation and restrictions of Example 1.The multivariate vector is partitioned in subvectors with dimensions k and p − k, so we obtain that 1 (p−k)×k , where 1 (p−k)×k denotes a (p − k) × k matrix with all the entries equal to one.Consequently, . Therefore, a direction where a perturbation in asymmetry produces the maximum relative local effect on the conditional distribution is Example 4. Maximum RCSM for bidimensional permutation-symmetric normal vectors A particular simple but enlightening case is obtained when p = 2 with vector (Y, X) and conditioning variable X.Now, the normalized eigenvector associated with the highest eigenvalue λ 1 = 1 is e max = 1/ 1 + ρ 2 −ρ/ 1 + ρ 2 , which gives the direction where perturbations have the highest impact on the conditional distribution.On the other hand, the minimum impact, or equivalently the maximum impact on the marginal univariate conditioning distribution, is obtained through the direction e min = 0 1 .
In Figure 2 we have plots of RCSM for different values of the correlation coefficient.Since the minimum RCSM is attained in the direction e min all the curves intersect at angle θ = π 2 .Furthermore, the angle that provides the direction with maximum RCSM is functionally related to the correlation coefficient by θ = π − arctan(ρ) when 0 < ρ ≤ 1 and by θ = − arctan(ρ) when −1 ≤ ρ < 0. Note that when both components of the vectors are linearly dependent, with ρ = −1 or ρ = 1, the directions yielding the maximum RCSM are given by the angles θ = π 4 and θ = 3π 4 respectively -see their positions at the vertical lines of Figure 2.They are also the directions with the minimum RLD, as was shown in Example 2. This fact is explained by the perfect linear correlation between both variables

Conclusions
In this paper we use the Kullback-Leibler (KL) divergence to introduce local deviation (LD) and relative local deviation (RLD) indices designed to assess the effect slight departures from Gaussianity, due to asymmetry, have on the joint distribution of a multivariate vector; the asymmetry is regulated by the family of SN distributions.We have studied the relationship between Mardia and Malkovich-Afifi's measures of skewness and the LD index, and have also proved the existence of a monotone relation between these measures and the latter; their connection is described by the slant parameter α * = α Ωα [3].This fact has also provided new insights about how α * can be interpreted as an indicator that encapsulates the non Gaussianity in the SN model.
We have also studied the local effect of asymmetry in the conditional distribution of a block of variables conditioned on the others, as well as the local effect on the marginal distribution of the conditioning variables.With this aim, we have proposed a relative conditional sensitivity measure, RCSM, that evaluates such effects.Its simple closed form as a Rayleigh's quotient allows us to determine the directions of asymmetry for which slight perturbations lead to the highest impact on the conditional distribution; this finding has important practical implications in statistical learning and data analysis procedures that assume the multivariate Gaussianity as a starting hypothesis.In particular, for GBN we can select the partition of the network variables leading to outputs with minimal impact for perturbations of the joint distribution.Then, we can use direct results to propagate evidence in GBN even when the symmetry assumption is not realistic.