Please login first

List of accepted submissions

Show results per page
Find papers
  • Open access
  • 85 Reads
Confined Polymers as Self-Avoiding Random Walks on Restricted Lattices

Through extensive Monte Carlo simulations [1] we study the crystallization of freely-jointed chains of tangent hard spheres under conditions of extreme confinement. The latter is realized through the presence of flat, parallel and impenetrable walls in one or more dimensions [2]. Extreme confinement corresponds to the state where the inter-wall distance, in at least one dimension, approaches the monomer size. Results are presented for quasi-1D (tube-like) and quasi-2D (plate-like) polymer templates. In both cases we observe the entropy-driven formation of highly ordered regions of close-packed, slightly defective crystals of different orientations. In a second stage we map the confined polymer chains onto the self-avoiding random walk (SAW) model on restricted lattices [3]. We enumerate all possible chain configurations (or SAWs) on a specific regular lattice subject to spatial restrictions arising from confinement. Through this we can determine the conformational component of entropy and eventually predict the thermodynamic stability of each distinct polymer crystal. In parallel, we obtain approximate expression for the SAW behavior as a function of chain length, type of lattice, and level of confinement. We present a simple geometric argument to explain, to first order, the dependence of the number of restricted SAWs on the type of SAW origin. Restricted lattices correspond to the cubic (simple, body center and face center) crystal system and results are compared against the ones of the bulk (unrestricted) case.

  1. Ramos, P.M.; Karayiannis, N.C.; Laso, M. J. Comput. Phys. 2018, 375, 918-934.
  2. Foteinopoulou, K.; Karayiannis, N.C.; Laso, Chem. Eng. Sci. 2015, 121, 118-132.
  3. Benito, J.; Karayiannis, N.C.; Laso, M. Polymers 2018, 10.
  • Open access
  • 53 Reads
A Model-Based Reinforcement Learning Approach for a Rare Disease Diagnostic Task

In this work, we study the problem of inferring a discrete probability distribution using both expert knowledge and empirical data.

This is an important issue for many applications where the scarcity of data prevents a purely empirical approach. In this context, it is common to rely first on an a priori from initial domain knowledge before proceeding to an online data acquisition. We are particularly interested in the intermediate regime, where we do not have enough data to do without the initial a priori of the experts, but enough to correct it if necessary.

We formalize expert knowledge as a set of priors, e.g. on the marginals or on the support of distribution. The expert distribution is defined as the distribution of the maximum entropy that satisfies the constraints set by the experts. In turn, empirical data is used to construct the empirical distribution.

We present a new method for objectively choosing the weight to be given to the experts in relation to the data. We define our estimator as the projection of the experts on the confidence interval centered on the empirical distribution. This is the closest distribution from the experts which is consistent with the observed data. The confidence level is the unique parameter of this method.

We show, both empirically and theoretically, that our proposed estimator is always more efficient than the best of the two models (expert or data alone) within a constant.

Our estimator allows a bad a priori to be abandoned relatively quickly when the inconsistency of the data collected with the initial a priori is observed. At the same time, this same mixture makes it possible to keep the initial a priori if it is good. We prove empirically that our method outperforms a parametric Bayesian approach in such a task.

  • Open access
  • 74 Reads
On Conditional Tsallis Entropy

Tsallis entropy, a generalisation of Shannon entropy that depends on a parameter alpha, provides an alternative way of dealing with several characteristics of nonextensive physical systems given that the information about the intrinsic fluctuations in the physical system can be characterized by the nonextensivity parameter alpha. It is known that as the parameter alpha approaches 1, the Tsallis entropy corresponds to the Shannon entropy. Unlike for Shannon entropy, but similarly to Rényi entropy (yet another generalisation of Shannon entropy that also depends on a parameter alpha and converges to Shannon entropy when alpha approaches 1), there is no commonly accepted definition for the conditional Tsallis entropy. In this work, we revisit the notion of conditional Tsallis entropy by studying some natural and desirable properties in the existing proposals: when alpha tends to 1, the usual conditional Shannon entropy should be recovered; the conditional Tsallis entropy should not exceed the unconditional Tsallis entropy; and the conditional Tsallis entropy should have values between 0 and the maximum value of the unconditional version. We also introduce a new proposal for conditional Tsallis entropy and compare it with the existing ones.

  • Open access
  • 79 Reads
Predicting Human Responses to Syllogism Tasks Following the Principle of Maximum Entropy

Syllogistic reasoning is one of the major research domains in cognitive science. Syllogisms are quantified semi-logical statements that consist of two premises, each relating two terms by one quantifier out of "All", "No", "Some", and "Some not". While one of the terms is mentioned in both premises, one is interested in what conclusion can be drawn about the relationship between the other two terms. For example, a well-formed syllogism task is "If all A are B and no B is a C, what, if anything, follows about the relationship between A and C?" While some syllogism tasks have a logically valid conclusion (in the example above, "No A is a C." is logically valid), some have not, like "If all A are B and some B are C, what follows about A and C?" In cognitive science, human responses to syllogism tasks are studied in order to better understand the human understanding of quantification and uncertainty in reasoning.

In order to predict human responses to syllogism tasks, we develop a probabilistic model of syllogisms based on the principle of maximum entropy. For this, we translate the premises of syllogisms into probabilistic conditional statements and derive the probability distribution that satisfies the conditional probabilities while having maximal entropy. Then, we calculate the probabilities of all possible conclusions and compare them with the respective quantifier. As a prediction, we basically choose the option with the best matching. Based on empirical data, we show that our maximum entropy model predicts human responses better than established cognitive models.

  • Open access
  • 94 Reads
The Entropy Universe

About 160 years ago, the concept of entropy was introduced in thermodynamics by Rudolf Clausius. Since then, it has been continually extended, interpreted, and applied by researchers in many scientific fields, such as general physics information theory, chaos theory, data mining, and mathematical linguistics. Based on the original concept of entropy, many variants have been proposed. This paper presents a universe of entropies, which aims to review the entropies that had been applied to time series. The purpose is to answer important open research questions such as: How did each entropy emerge? What is the mathematical definition of each variant of entropy? How are entropies related to each other? What are the most applied scientific fields for each entropy? Answering these questions, we describe in-depth the relationship between the most applied entropies in time series for different scientific fields establishing bases for researchers to properly choose the variant of entropy most suitable for their data.

The number of citations over the past fifteen years of each paper proposing a new entropy, was accessed. The Shannon/differential, the Tsallis, the sample, the permutation, and the approximate entropy were the most cited entropies. Based on the ten categories with the most significant number of records obtained in the Scopus categories, the areas in which the entropies are more applied are computer science, physics, mathematics, and engineering. From the top ten, the application area with less citations of papers proposing new entropies is the medical category.

  • Open access
  • 83 Reads
Information Geometry of Estimating Functions in Parametric Statistical Models

In information geometry, a parametric statistical model (a family of probability density functions) is treated as a differentiable manifold, where the Riemannian metric called Fisher metric and the pair of two torsion-free dual affine connections called the exponential and mixture connections play essential roles for statistical inference. For example, the maximum likelihood estimation in an exponential family can be interpreted as the orthogonal projection of the geodesic defined by the mixture connection. This comes from the fact that an exponential family is a dually flat space, where the curvature and the torsion tensors of the two dual affine connections are all equal to zero. Recently, it has been found by the authors that a general estimating function naturally induces a similar geometric structure on a statistical model, that is, a Riemannian metric and a pair of dual affine connections, through the concept called pre-contrast function. In this case, however, one of the affine connections is not necessarily torsion-free, especially when the estimating function is not integrable with respect to the parameter of the statistical model. In this presentation, we explain the construction and some properties of this geometric structure with related concepts in information geometry. In addition, some of its statistical implications are discussed using an example of non-integrable estimating functions which induces a partially flat space, where only one of the induced affine connections is flat (curvature-free and torsion-free).

  • Open access
  • 49 Reads
Gauge freedom of entropies on q-Gaussian distributions

This is a joint work with Asuka Takatsu at Tokyo Metropolitan University.

A q-Gaussian distribution is a generalization of an ordinary Gaussian distribution. The set of all q-Gaussian distributions admits information geometric structures such as an entropy, a divergence and a Fisher metric via escort expectations. The ordinary expectation of a random variable is the integral of the random variable with respect to its probability distribution. Escort expectations admit us to replace the law to any other distributions. A choice of escort expectations on the set of all q-Gaussian distributions determines an entropy and a divergence. The q-escort expectation is one of most important expectations since this determines the Tsallis entropy and the alpha-divergence.

The phenomenon gauge freedom of entropies is that different escort expectations determine the same entropy, but different divergences.

In this talk, we first introduce a refinement of the q-logarithmic function. Then we demonstrate the phenomenon on an open set of all q-Gaussian distributions by using the refined q-logarithmic functions. We write down the corresponding Riemannian metric.

  • Open access
  • 160 Reads
A Kolmogorov Complexity for multidisciplinary domains

Kolmogorov complexity, or algorithmic information theory, measures information in an individual object as its smallest possible representation. These metrics have been applied in Computer Science and in several other scientific disciplines. It is known that this measure is not computable, however, we can approach it from above using standard compressors.

In the scope of statistical or clustering methods, it is important to measure the absolute information distance between individual objects. The Normalized Information Distance (NID) measures the minimal amount of information needed to translate between two objects, however, it is uncomputable. There is a set of metrics used to approximate the NID, such as Normalized Compression Distance (NCD), Compression-Based Dissimilarity Measure (CDM) and Lempel–Ziv Jaccard Distance (LZJD). These methods, unlike other approaches, do not require any specific background knowledge of the dataset, that is, the user of this method only needs to have some knowledge about data mining techniques or data visualization.

It is utmost important to improve current implementations in order to create a central, simply usable repository that supports multiple metrics, ensuring that a researcher can use some of the most important past work techniques to extract the most accurate information with the approach of NID. Current implementations are poorly document and lake the support to some enhancements already presented in past work literature. An example is the need to replace the compressed size with the percentage of compression in each of the files to achieve comparison with different file sizes.

  • Open access
  • 116 Reads
General entropy-based framework for a robust and fail-safe multi-sensor data fusion

The 21st century will be the one that will undoubtedly mark the advent of data as a new digital gold. Intelligence is found everywhere in ever-smaller sensors that we no longer perceive. In this study, we are interested in the autonomous vehicle application, vehicles become intelligent, communicating together and with infrastructures. Data is there, everywhere, and constitutes an immaterial source that allows us to increase or even to delegate our decision-making power.

But for certain critical applications from a safety point of view, such as autonomous driving, a decision-based on intentional or non-intentional false, partial or incoherent knowledge could induce dangerous actions having negative effects on goods or people. The delegation of decision-making power to such safety-relevant applications makes regulatory authorities reluctant. It is for these types of applications that the concept of Fault-Tolerant Fusion (FTF) is developed. Being able to detect inconsistencies, while implementing a mitigation strategy (discard or compensation) makes it possible to almost continuously ensure state estimation, and therefore makes at each instant appropriate decisions/action, with a high level of integrity.

In this study, we present a general entropy-based framework for the development of a robust and fail-safe multi-sensor data fusion. From the informational form of the robust used stochastic filter, the MCCUIF (Maximum Correntropy Criterion Unscented Information Filter), to the adaptive diagnostic (FDI: Fault Detection and Isolation) layer based on α-Rényi divergence, passing through optimized thresholding (making it possible to maximize the availability of the system while ensuring the high required level of safety), this framework is an efficient and powerful tested example of FTF. It has been tested in real-time and with real data for the high-integrity ego-localization of a vehicle using GNSS and odometer measurements.

  • Open access
  • 100 Reads
Entropy analysis of n-grams and estimation of the number of meaningful language texts

When solving a number of information security problems, one of the problems is to estimate the number of possible meaningful texts of fixed length. To estimate this value, various approaches can be used, in each of which the key parameter is the information entropy. To estimate the number of short plaintexts, the entropy of n-grams is used. For long ones, in turn, we use the entropy of the language (specific entropy). N-grams, in this case, are n consecutive characters of meaningful text. The well-known information-theoretic approach allows us to obtain an asymptotic estimate of the meaningful text number based on the second Shannon theorem. In practice, to implement this approach, the text under study is presented in the form of a Markov source.

We consider a different approach to estimating the number of meaningful language texts, using the combinatorial method, the origins of which go back to the ideas of Kolmogorov. Representing a text as a set of independent n-grams, we experimentally estimate the number of semantic n-grams in a language by compiling dictionaries based on a large text corpus. In order to evaluate the I type errors of taking a meaningful n-gram for a random one, which inevitably occur during experimental evaluation, we developed a methodology for evaluating the coverage of the dictionary. We use this amount of coverage to refine and recalculate the original volume of the dictionary. Based on the number of meaningful n-grams of the language, we determine the entropy of short texts of various lengths. This sequence of estimates allows us to mathematically model the further change in the entropy function, extrapolate for long segments, and find the specific value of the entropy of the language.