Please login first

List of accepted submissions

 
 
Show results per page
Find papers
 
  • Open access
  • 3 Reads
Principal component methods for exploring latent semantic structures in academic text data

The accelerating growth and increasing thematic complexity of academic publications necessitate the use of quantitative methods capable of uncovering latent semantic structures within large-scale collections of scholarly text. Among classical multivariate techniques, principal component methods offer a well-established, theoretically grounded, and highly interpretable framework for dimensionality reduction and exploratory analysis in high-dimensional settings. Within this context, the present study investigates the application of principal component methods for exploring latent semantic structures in academic text data indexed in Scopus. The empirical analysis is based on a corpus of peer-reviewed publications retrieved from the Scopus database and represented through structured textual components, including titles, abstracts, and author-provided keywords. Following standard procedures for text preprocessing and normalization, the textual data are transformed into a high-dimensional multivariate feature space using term-based representations combined with appropriate weighting schemes. This transformation allows individual documents to be treated as multivariate observations, thereby enabling the systematic application of principal component methods. Principal component analysis is employed to achieve dimensionality reduction and to identify orthogonal components that capture the dominant sources of variance within the semantic feature space. The extracted components are examined through their respective loading structures in order to interpret the underlying latent semantic dimensions and to assess their contribution to the overall thematic organization of the corpus. This analytical strategy supports the structured exploration of semantic variation while mitigating methodological challenges associated with high dimensionality and multicollinearity. The results demonstrate that a relatively small number of principal components can effectively represent the latent semantic structure of large academic text collections, yielding an interpretable and analytically coherent summary of prevailing research themes. Overall, the findings underscore the utility of principal component methods as transparent and reproducible tools for the semantic exploration of scholarly literature and substantiate their relevance for large-scale literature analysis and thematic mapping.

  • Open access
  • 6 Reads
A Multivariate Evaluation of Water Quality and Regulatory Compliance from Diverse Sources

In this study, a multivariate statistical framework is applied to evaluate water quality across multiple source types in Kano State, Nigeria. We evaluated a total of 33 water samples collected from 11 locations, encompassing drinking water, groundwater, industrial wastewater, and surface water. Six physicochemical parameters in the form of pH, electrical conductivity (EC), dissolved oxygen (DO), total dissolved solids (TDS), turbidity, and total suspended solids (TSS) were analysed. Descriptive statistics, one-way analysis of variance (ANOVA), correlation analysis, principal component analysis (PCA), k-means clustering, and water quality index (WQI) assessment were employed to characterize spatial variability and pollution patterns. The results indicate pronounced differences in water quality among source types, with surface water showing the highest overall quality (mean WQI = 88.4) and industrial wastewater the lowest (mean WQI = 56.6). Compliance with drinking water standards was highest for TDS (90.9%) and lowest for turbidity (39.4%). Cluster analysis identified four water quality groups, with industrial wastewater samples predominantly forming isolated clusters, reflecting elevated pollution levels. Significant moderate correlations were observed between EC and DO (r = −0.66, p < 0.001) and between TDS and TSS (r = 0.52, p = 0.002), suggesting inter-parameter dependencies. These findings highlight the urgent need for improved industrial wastewater treatment and strengthened monitoring of groundwater resources to safeguard public health and water sustainability in the region.

  • Open access
  • 6 Reads
An Adaptive Vector Barrier Interior-Point Method Using Majorant Functions

This paper proposes an adaptive interior-point method for solving convex optimization problems subject to inequality constraints. The approach is based on a logarithmic barrier formulation in which the barrier parameter is taken as a vector rather than a single scalar. In contrast to classical interior-point methods that rely on uniform updates of the barrier parameter, the proposed algorithm introduces a componentwise adaptive strategy, allowing each constraint to be treated with an individual level of penalization. This flexibility improves the algorithm’s ability to handle heterogeneous constraints and enhances numerical performance, particularly for large-scale problems.

A key feature of the method is the computation of the step size using a carefully constructed majorant function. This strategy eliminates the need for traditional line search procedures, thereby reducing computational overhead while maintaining robustness. The algorithm is designed to ensure that all iterates remain strictly within the feasible region, guaranteeing feasibility preservation throughout the optimization process. Furthermore, it is shown that the objective function value decreases monotonically along the iterations.

Rigorous theoretical analysis is provided to establish the descent property, feasibility preservation, and global convergence of the proposed method under standard assumptions for convex optimization. These results demonstrate that the algorithm converges to an optimal solution of the original constrained problem.

To evaluate the practical performance of the method, numerical experiments are conducted on a set of large-scale convex optimization problems. The results indicate that the proposed adaptive interior-point method outperforms classical interior-point approaches in terms of efficiency and robustness, highlighting its potential for solving high-dimensional constrained optimization problems.

  • Open access
  • 8 Reads
Estimation of Semi-Bilinear Time Series Models by the Method of Empirical Moments: Specification of Optimal Noise by Deep Learning
, ,

This research proposes an innovative methodology for estimating coefficients in nonlinear semi-bilinear time series models. We develop a robust estimation framework based on an alternative approach using empirical moments, specifically designed to overcome the limitations of classical methods in the face of the complexity of these hybrid structures. The objective is to provide an efficient, accurate, and computationally viable inference process for these models, where a linear autoregressive component and a nonlinear bilinear operator dynamically interact. Extensive numerical simulations rigorously validate the performance and robustness of our approach. The results demonstrate a marked superiority in terms of bias, variance, and stability of the proposed estimators, compared to conventional estimation techniques, particularly in the context of small samples or pronounced nonlinearities. This methodological advance opens promising application perspectives for the analysis of complex sequential data in demanding fields such as financial markets, economic forecasting, and big data analytics. A complementary and original investigation extends this contribution by examining the critical influence of the specification of the innovation term. A systematic study is conducted by substituting the standard white noise assumption with more realistic and volatile noise structures, such as ARCH, GARCH, COGARCH, and PAR (Periodic Autoregressive Process). This analysis reveals and quantifies the significant impact of the noise distribution, heteroscedasticity, and correlation dynamics on the performance and final accuracy of the estimators, particularly when calibrated using deep learning algorithms. These results underscore the crucial importance of appropriate innovation modeling for optimizing inference and defining the practical conditions for the optimal application of our proposed method.

  • Open access
  • 114 Reads
Mathematical Analysis of the Dynamic Modeling of Heart Rate
, , , , ,

Most existing heart rate monitoring systems rely on fixed threshold values to trigger alerts, resulting in reactive responses that often fail to capture early physiological deterioration. Such approaches neglect the temporal structure of heart rate signals and their dynamic behavior. In this work, we propose a predictive framework for heart rate monitoring based on robust temporal analysis. The heart rate signal is modeled as a timedependent function and analyzed through discrete approximations of its first and second derivatives computed via finite-difference methods. These derivatives quantify the rate of change and acceleration of heart rate dynamics, enabling the identification of emerging risk patterns. Anomalies are detected using Z-scores calibrated to an individualized baseline. To ensure robustness under noisy real-world sensing conditions, the median and the Median Absolute Deviation (MAD) are employed as statistical estimators. The scaling factor 1.4826⋅MAD is used to obtain a dispersion measure consistent with normal variability. The time-to-critical state is estimated by extrapolating the temporal evolution of the Z-score, using linear prediction in steady regimes and a quadratic formulation, solved analytically, when positive acceleration is observed. The proposed framework identifies risk trajectories that remain undetected by static threshold methods, particularly in scenarios involving rapidly accelerating heart rate. The time-to-critical estimation provides a quantitative prediction of imminent risk, enabling anticipatory monitoring rather than delayed alerting. The results demonstrate that integrating calculus-based temporal analysis with robust statistical modeling yields a transparent and mathematically consistent predictive framework for physiological monitoring. The approach also shows strong potential for extension to other biosignals and contributes to applied mathematics and statistical analysis of health-related time series.

  • Open access
  • 10 Reads
A Deep Learning-based Comparative Analysis for EUR/USD Exchange Rate Prediction

Foreign exchange rates play a significant role in global finance, impacting international trade, investment decisions, and economic stability. Due to their volatile and nonlinear behavior, accurately predicting currency exchange rates has become a crucial area of financial research. This study analyzes historical exchange rate data to identify complex temporal and spatial patterns using advanced deep learning techniques. In this study, the main objective is to evaluate and compare the predictive performance of five deep learning models for EUR/USD exchange rate prediction. The model architectures are Convolutional Neural Network (CNN), Multi-Kernel Convolutional Neural Network (Multi-CNN), Attention Mechanism (AM), and two hybrid models, namely, Deep Attentive Convolutional Fusion (DACF) and Deep Attentive Multi-Kernel Convolutional Fusion (DAMCF). A big data analysis is conducted using 5,352 daily Open, High, Low, and Close (OHLC) values for the EUR/USD exchange rate covering the period from 2003 to 2024. Weighted averaging and normalization were used as preprocessing techniques. Then, the methodology involves rolling window analysis, candlestick chart visualization, and the design of model architectures that include CNN layers, multi-scale convolution kernels, attention mechanisms, and hybrid models combining spatial and temporal features. The models are evaluated using RMSE, MAPE, and R². The experimental results show that all the models achieved considerable accuracy, with the three best performing models being DAMCF (R² = 0.9556), DACF (R² = 0.9210), and AM (R²=0.9051). Further, the DAMCF model demonstrates the greatest ability to learn complex market patterns. These results demonstrate the potential of hybrid architectures in financial forecasting that combine CNN and AM. The suggested method provides insightful information for monetary policy planning, risk management, and algorithmic trading.

  • Open access
  • 3 Reads
Free-energy structure of the entropy–extropy ratio on the probability simplex

Background:
Information measures such as Shannon entropy and its complementary quantity, extropy, play a central role in statistical physics, information theory, and eco-evolutionary dynamics. However, the geometric and thermodynamic interpretation of entropy–extropy relationships on the probability simplex remains largely unexplored.

Objective:
This work aims to establish a variational and thermodynamic structure for the normalised entropy–extropy ratio, and to determine whether it induces a dissipative gradient flow compatible with nonequilibrium relaxation.

Methods:
Using the Fisher–Shahshahani metric on the probability simplex, the Riemannian gradient flow is derived and expressed in replicator form. The relationship with Kullback–Leibler dissipation is analysed, and a generalised free energy functional is constructed to compare gradient dynamics.

Results:
The gradient flow is shown to be equivalent, up to time reparametrization, to the Fisher–Shahshahani gradient flow of the generalised free energy. The functional decreases monotonically along trajectories, the uniform distribution emerges as the unique equilibrium, and the evolution corresponds to nonequilibrium thermodynamic relaxation governed by the KL-divergence dissipation.

Conclusions:
The entropy–extropy ratio admits a free energy interpretation linking information geometry, replicator dynamics, and thermodynamic dissipation. This framework provides a structural measure of the balance between dispersion and organisation in probabilistic systems and suggests its relevance in quasi-stationary nonequilibrium regimes and to high dimensional limits.

  • Open access
  • 4 Reads
Asymptotic Properties of M-estimators
,

Functional statistics plays a central role in statistical research. In this study, we focus on conditional models, which can be formulated as follows: let X be a functional random variable and Y a multidimensional random variable. The prediction of Y given X is modeled through a mapping r(.) applied to X.

To approximate Y conditional on X, we construct estimators for functional parameters using the kernel method. Examples include regression, quantile, expectile, and conditional mode estimation.

After constructing an estimator of the functional parameter, the asymptotic analysis proceeds along two main lines:

First, we establish the almost-sure uniform convergence of nonparametric estimators for certain conditional models, specifying the corresponding rates of convergence.

Second, we derive the asymptotic normality of the estimator under standard regularity conditions. We also discuss its application in constructing confidence intervals. Furthermore, we provide an explicit expression for the asymptotic covariance matrix.

References

  1. Abdous, B.; Theodorescu, R. Note on the spatial quantile of a random vector. Statist. Probab. Lett. 1992, 13, 333–336.
  2. Bouzebda, S.; Taachouche, N.: Multivariate spatial conditional U-quantiles: a Bahadur–Kiefer representation. Results Appl. Math. 2025, 26, Paper No. 100593.
  3. Chaouch, M.; Laïb, N. Nonparametric multivariate L1-median regression estimation with functional covariates. Electron. J. Stat. 2013, 7, 1553–1586.
  4. Chaudhuri, P. On a geometric notion of quantiles for multivariate data. J. Amer. Statist. Assoc. 1996, 91, 862–872.
  5. Cheng, Y.; De Gooijer, J.G. On the uth geometric conditional quantile. J. Statist. Plann. Inference 2007, 137, 1914–1930.
  6. Xu, D.; Du, J. Nonparametric quantile regression estimation for functional data with responses missing at random. Metrika 2020, 83, 977–990.
  • Open access
  • 3 Reads
A Bayesian Network Approach to Understanding Uncertainty and Interactions in the Human Development Index

The Human Development Index (HDI) is a widely used measure of development based on health, education, and income indicators published by UNDP. Although HDI is easy to interpret, it is calculated using a fixed formula that does not show uncertainty or explain how these indicators influence each other. With the availability of long-term HDI data for many countries, more flexible statistical methods can be applied.

In this study, we use a Bayesian Network to model the relationships between HDI and its core components: life expectancy, expected years of schooling, mean years of schooling, and gross national income per capita. A Bayesian Network represents these indicators as connected variables, where the connections are learned from UNDP data rather than assumed in advance. The model is estimated using standard Bayesian statistical methods and validated using cross-country data.

The results are expected to show that HDI components are not independent and education plays a crucial role in influencing both income and health outcomes. The Bayesian Network allows us to estimate the probability distribution of HDI rather than a single value, revealing uncertainty in HDI scores and rankings. Policy simulation experiments demonstrate that improvements in different indicators lead to different HDI gains depending on a country’s development structure.

This probabilistic framework extends the traditional HDI by incorporating uncertainty and interdependence among indicators. The proposed approach provides a useful statistical tool for development analysis and policy evaluation using official UNDP data.

  • Open access
  • 8 Reads
Statistical Methods for Inference from Non-Probability Samples and their Application to a Health Barometer

We are in the information age, and every day it becomes easier to obtain information on any topic and for any population of interest. Online surveys, large-scale datasets (Big Data), or social media are modern data sources that provide large samples without the need for substantial resources and within a short period of time. However, if one wishes to make estimates from this type of dataset, great care must be taken, since most of them are non-probability samples. Non-probability samples lack an exhaustive probabilistic sampling design, making it impossible to ensure the representativeness of the sample, which leads to biased estimates even when the sample is large (Meng, 2018).

In order to obtain more reliable and accurate estimates from this type of data, different statistical techniques for bias reduction have been developed. These methods require auxiliary information, and depending on the available auxiliary information, different types of techniques can be applied. If we know the population totals of our covariates, we can apply the Calibration technique. If we have a reference probability sample (in which the variable of interest is missing), we can apply Propensity Score Adjustment (PSA) to estimate the inclusion probabilities in our original sample, or Statistical Matching (SM) to predict the unknown variable of interest in the auxiliary sample. In this same situation, the Doubly Robust (DR) estimator can be applied, which combines both approaches. These are only some of the best-known techniques in this field. Unfortunately, there is no gold standard in practice, and the results will depend on the specific study in question.

In this work, we will study the problem of estimation based on non-probability samples, explore the different methods for inference from such samples, and examine their effectiveness in a real survey conducted by the Spanish Ministry of Health.

Top