Please login first

List of accepted submissions

 
 
Show results per page
Find papers
 
  • Open access
  • 0 Reads
Ethical Data Engineering for AI in Crime Analysis: Strategies to Minimize Bias and Enhance Fairness

This research addresses the development of strategies for ethical data engineering in crime analysis, emphasizing the minimization of ethical and racial biases. The focus is on structured datasets, specifically those on hate crimes and police shootings in the United States, sourced from Kaggle. These datasets include both categorical and numerical features, making them appropriate for evaluating diverse techniques before expanding to more complex data types, such as images.

The data engineering process employs various methods to ensure fair representation and reduce bias. For data quality, techniques such as outlier detection, correlation analysis, and feature scaling are utilized to balance the distribution of sensitive attributes and minimize distortions. In the preprocessing stage, issues like missing values, incorrect labels, and potentially biased correlations are addressed. Dataset balancing is achieved through methods including SMOTE, Adaptive Synthetic Sampling, and NearMiss to manage class imbalances and ensure proportional representation. These steps are supported by fairness metrics such as disparate impact and equalized odds to continuously evaluate and refine the model outputs.

Preliminary tests were structured to evaluate the effects of different data engineering strategies on bias reduction. The application of various preprocessing and balancing methods demonstrated that systematic handling of class imbalances and feature distributions resulted in a significant decrease in model bias. Consequently, models showed improved fairness and reduced disparities across sensitive attributes, such as race and gender. These findings indicate that a robust data engineering process can positively impact the ethical performance of AI systems, mitigating issues such as discrimination and underrepresentation.

The results suggest that these strategies effectively enhance both technical and ethical standards in AI systems. Future research will expand this framework to non-categorical datasets, allowing broader applications in public security and beyond.

  • Open access
  • 0 Reads
Optimal sizing and Energy Management/Control of RES-Based Hybrid Systems

During the past decade and more, the conventional energy production has led to the exaggeration of Climate Change due to the continuous emissions of greenhouse gases. To this end, renewables are the means of energy transition under the installation of Photovoltaics (PVs) and Wind Turbines (WTs) either in grid-dependent or stand-alone operation mode.

Aim of this study is the development of a combined sizing and energy management/control that will simultaneously provide optimal solutions for the accurate sizes of RES-based units that are able to incorporate short-term energy shortage with accumulators. Two options will be presented with this study: a) sizing of PV/WT-grid dependent and b) PV/WT-off grid systems. In the 1st option, the net-metering strategy will be applied and the optimal problem set-up will seek solution for sizing PV/WT hybrid systems towards satisfying a variable load demand with minimum power losses. In the 2nd option, the same strategy will be applied but in this case, the optimal sizing will take into account the use of short-term energy storage (accumulators) under a flexible energy management strategy (protecting the overutilization of charging/discharging cycles). As a reference region, two areas in Greece will be selected, Kozani at the Western Macedonia and Crete at the Southern part of Greece. In all modeling and simulation results, validated mathematical models of PV/WT units will be used for highest accuracy.

Based on the above results, this study will close by presenting a novel optimal sizing and energy management/control strategy for the operation of PV/WT systems along with hydrogen production (through PEM electrolyzers and high-pressure storage) and accumulators. As will be presented, this novel strategy can lead towards a completely optimal hybrid system with minimum power losses, minimum capital and operating expenses (CAPEX+OPEX) and satisfying both hydrogen and load demands.

  • Open access
  • 0 Reads
Integração e Padronização de Dados Heterogêneos de Autismo

Autism Spectrum Disorder (ASD) affects a significant portion of the population, with approximately 636,000 students diagnosed with autism enrolled in schools in Brazil, reflecting a 48% increase compared to previous years, according to the 2023 School Census. However, the joint analysis of ASD-related data presents a challenge due to the heterogeneity of sources, including medical records, monitoring devices, clinical evaluations, and questionnaires. This project aims to address these challenges by developing an automated pipeline to collect, clean, transform, and integrate heterogeneous ASD data. Using tools such as Pandas, Amazon S3, Google BigQuery, Tableau, Scikit-learn, and TensorFlow, the pipeline ensures data quality and standardizes its formats and terminologies. Automation was managed by Apache Airflow, ensuring the continuous and efficient execution of the process. The integrated data enabled advanced analyses, such as identifying behavioral patterns, correlating clinical and monitoring data, and performing sentiment analysis in questionnaires. The findings provided valuable insights into ASD, surpassing the state of the art by offering more accurate predictive models and clear visualizations that support decision-making by healthcare professionals. The project resulted in the creation of a robust infrastructure that improves the quality and usability of available ASD data, contributing to the development of more effective interventions and targeted public policies.

  • Open access
  • 0 Reads
Semi-supervised facial beauty prediction using contrastive pretraining with SimCLR

Facial beauty prediction is a complex task that relies on subjective human perceptions, making it a challenging area of study within computer vision. In this paper, we propose a semi-supervised approach that using contrastive pre-training with SimCLR (simple framework for contrastive learning of visual representations) to predict facial beauty scores. By utilizing contrastive learning, our model learns robust representations through the self-supervised task of distinguishing between different views of the same image and between different images. We leverage a diverse dataset, SCUT-FBP5500, which comprises 5,500 annotated facial images, to develop a model capable of accurately predicting beauty scores. Our proposed method involves two primary phases: first, we pre-train the model using contrastive learning to acquire robust visual representations from a larger set of unlabeled images, and then we fine-tune it on the labeled SCUT-FBP5500 dataset. The results demonstrate that our model achieves a Pearson correlation coefficient of 0.9267, surpassing state-of-the-art methods in beauty prediction. These findings indicate the effectiveness of using contrastive pre-training for this application, as our model not only enhances prediction accuracy but also aligns more closely with human judgments of beauty. This study contributes to ongoing research in aesthetic assessment and highlights the potential of semi-supervised learning to improve performance in subjective evaluation tasks.

  • Open access
  • 0 Reads
Assessment of the anthropogenic load levels of heavy metals: a case study on the example of the Styr River

Introduction: The increasing levels of heavy metals in natural waters pose significant environmental and health risks. This study focuses on the Styr River, particularly in the area affected by cooling water discharge from the Rivne Nuclear Power Plant in Ukraine. The primary aim is to analyze the distribution and sources of eight heavy metals: Zn, Cd, Pb, Cu, Ni, Mn, As, and Cr.

Methods: Monthly water samples were collected from 2018 to 2022 and analyzed using an inductively coupled plasma mass spectrometer (ICAP 7400 Duo). The analytical lines used were Zn (213.857 nm), Cd (226.502 nm), Pb (220.353 nm), Cu (324.754 nm), Ni (231.604 nm), Mn (257.610 nm), As (193.696 nm), and Cr (267.716 nm). Calibration was performed with standard solutions, and results were checked against internal quality standards. Statistical analyses included Pearson correlation and cluster analysis to identify relationships and potential sources of heavy metals.

Results: The average concentrations of heavy metals in the Styr River water followed the sequence: Cu > As > Zn > Mn > Ni > Cr > Pb > Cd. Seasonal and annual variations were observed, with notable decreases in Zn, Cu, and Mn in 2021, likely due to reduced anthropogenic activities. Pearson correlation and cluster analysis revealed distinct patterns, suggesting both natural and anthropogenic sources. Heavy metals like Pb, Cr, and Ni were associated with industrial emissions and urban pollution, while Cd and As showed more isolated sources. Despite the presence of these metals, their concentrations did not exceed the allowable limits set by the Council Directive 98/83/EC for drinking water.

Conclusions: This study provides a comprehensive assessment of heavy metal pollution in the Styr River. The findings indicate that the water quality remains within safe limits for human consumption, although continuous monitoring is essential. The results highlight the complex interplay of natural and anthropogenic factors influencing heavy metal levels, emphasizing the need for sustainable environmental management practices.

  • Open access
  • 0 Reads
Electrochemical techniques for monitoring analytes dissolved in acidic solutions for the food industry
, , , , , , ,

Introduction:

Real-time monitoring of liquid solutions has several critical applications in the food industry, where detecting and quantifying several analytes is essential for preventing health hazards, guarantying quality control and reducing production costs. The wine industry is presenting an increasing interest in monitoring different analytes, like sulfites, leading to studies regarding different materials and monitoring techniques. Electrochemical solutions present significant advantages for industrial applications due to their straightforward instrumentation. Nevertheless, when dealing with complex chemical solutions, like wine, one must find methods that allow for simple calibration curves and reutilization and can effectively account for many interfering molecules.

Methods:

In this work, we applied different electrochemical techniques for monitoring sulfites in acidic solutions. Electrodes made from different materials were used to perform the techniques of amperometry and cyclic voltammetry for sulfite detection. DC potential and pulsed amperometry techniques were compared. Cyclic voltammetry was used for detecting sulfites and identifying different solutions.

Results:

The results show that amperometry can be used to monitor different concentrations of sulfites, using pulsed potential waveforms (PAD) to prevent signal decay, and provide a more stable response. Cyclic voltammetry can also monitor sulfites but one great application is in identifying different solutions, analytes and electrodes, thus adding layers of information with features present in the respective voltammograms.

Conclusions:

The information gathered from these electrochemical techniques can be fed into different models that allow for the separation of interfering molecules from our desired target analyte. This suggests the potential of using amperometry and CV together as applicable methods for real-time analyte monitoring in the food industry.

Acknowledgments: I would like to express my gratitude to CQE (UIDB/00100/2020 and UIDP/00100/2020), IMS (LA/P/0056/2020) and Watgrid for providing the materials to conduct this study.

  • Open access
  • 0 Reads
Evaluating the quality of generative artificial intelligence in healthcare: a systematic review

The burgeoning use of Large Language Models (LLMs) in healthcare has spurred a need for robust evaluation methods to assess the quality, reliability, and efficacy of their outputs. This systematic review aims to map the landscape of existing methods employed to evaluate texts and other outputs generated by LLMs in the healthcare domain. The review protocol was registered on PROSPERO. A comprehensive search was conducted across multiple databases, including PubMed, IEEE Xplore, Google scholar and Scopus, focusing on studies published between 2010 and 2023. The inclusion criteria encompassed original articles that discuss methodologies for assessing the performance of LLMs in generating clinical and healthcare-related content. The review identifies a variety of evaluation techniques, broadly categorized into quantitative and qualitative methods. Quantitative assessments often involve metrics such as accuracy, precision, recall, and F1 score, particularly in tasks like clinical documentation, diagnostic support, and patient communication. Qualitative methods, on the other hand, emphasize human judgment, focusing on aspects such as coherence, adequacy, relevance, and readability, often through expert panel reviews and user satisfaction surveys. Additionally, the review highlights challenges unique to the healthcare context, such as the need for domain-specific knowledge, the handling of sensitive patient data, and the potential for bias in AI-generated content. The findings underscore the importance of interdisciplinary collaboration in developing and validating evaluation frameworks that not only measure technical performance but also consider ethical and practical implications. In conclusion, this review provides a comprehensive overview of current evaluation methods for generative AI in healthcare, identifies gaps in the existing literature, and proposes directions for future research to enhance the assessment of these advanced technologies in medical settings.

  • Open access
  • 0 Reads
Green upgrading of biodiesel derived from biomass wastes
, , , ,

With the increasing demand for edible oils for food and fuel purposes, non-edible oils have become more attractive for biodiesel production. Nevertheless, biodiesel has significant drawbacks that hinder its broader utilization, necessitating its blending with conventional diesel for current applications. These drawbacks include low oxidative stability (OS) and inadequate cold flow properties. These fuel properties are influenced by the composition of fatty acid methyl esters (FAMEs), with a particular emphasis on their degree of unsaturation. Compression ignition (CI) engines can effortlessly handle blends of up to 30% biodiesel mixed with diesel fuel without necessitating any modifications. However, surpassing this threshold and utilizing biodiesel to a greater extent demands engine upgrade. Partial hydrogenation in aqueous/organic biphasic catalytic systems of polyunsaturated FAMEs aims for maximum selectivity towards cis-monounsaturated FAMEs. This approach optimizes oxidative stability while preserving satisfactory cold flow properties to the greatest extent possible.

The method used in this work includes the characterization of biodiesel samples using EN ISO standard methods and gas chromatography–mass spectrometry (GC-MS) for qualitative and quantitative analysis. Based on the composition of biodiesel samples in polyunsaturated FAMEs, partial hydrogenation in aqueous/organic biphasic catalytic systems using transition metal complexes aims at improving the properties of produced biodiesel to meet specific standards while acting as a purification step, effectively eliminating impurities.

The highlighted results are (i) the research and development of an aqueous/organic biphasic catalytic system for the partial hydrogenation of biodiesel, and (ii) the improvement of biodiesel properties that do not meet EN ISO standard specifications.

Given the ongoing research and development in this field, the catalytic upgrading of biodiesel through partial hydrogenation in aqueous/organic biphasic catalytic systems seems promising. Further exploration of innovative catalysts and techniques holds potential for advancing biodiesel production and its application.

  • Open access
  • 0 Reads
Feature Engineering for Lung Cancer Classification Using Next-Generation Sequencing Data
,

Next-generation sequencing (NGS) has profoundly transformed the field of genomics with its ability to detect molecular findings on a large scale, particularly for the somatic genome. Research on complex diseases such as lung cancer has shifted significantly as NGS technology provides an efficient method to unravel the genetic fingerprint of this extensively studied disorder. This advancement has opened new pathways for understanding the molecular underpinnings of lung cancer, facilitating more targeted approaches in diagnosis, treatment, and research. While NGS data are highly dimensional and complex, they posea significant challenge for data analysis and classification tasks. In this paper, we investigated feature engineering to improve the classification accuracy of lung cancer using NGS data. The goal of all these methods of dimensionality reduction, feature selection, and transformation techniques is to improve machine learning's predictive power. In this work, the dimensionality reduction method Principal Component Analysis (PCA) is used to optimize feature selection. Advanced transformation techniques like normalization and scaling are applied to optimize the data for better model performance. The efficacy of these techniques is evaluated through a comprehensive comparison of various machine learning classifiers, including Support Vector Machines (SVMs). The results demonstrate that efficient feature engineering enhances the classification accuracy and robustness of lung cancer prediction models, providing valuable insights for the development of precision medicine approaches in oncology.

  • Open access
  • 0 Reads
Abrasivity assessment of Triassic limestone and volcaniclastic sandstone in Mae Mon Basin, northern Thailand: comparison between RAI and CAI

The ability of rocks to wear tools used for ground excavation, tunneling, or drilling, is referred to as rock abrasivity. There are a variety of testing methods to estimate rock abrasiveness, ranging from microscopic to pilot scales. This study tests the abrasivity of selected sedimentary rocks observed in the Mae Moh Basin, northern Thailand, using the two abrasivity testing methods, including RAI (rock abrasivity index) and CAI (CERCHAR abrasivity index). The method of RAI involves mineralogical analysis coupling with the uniaxial compressive strength tests of the rock. The mineral assemblage formed in the rock and its content are microscopically observed under a conventional microscope. Each mineral is also compared to quartz in terms of hardness. The CAI method, on the other hand, observes the wear of a steel stylus tip after direct scratching on a rock surface under a systematic setup. The diameter change of the eroded tip is subsequently used for the calculation of the CAI. The results show that the limestone exhibits an RAI of 2.12, indicating it is not abrasive, and a CAI of 1.24, which indicates it has medium abrasivity. The volcaniclastic sandstone exhibits an RAI of 30.88, indicating medium abrasivity, whereas its CAI is 2.72, which indicates high abrasivity. The calculated CAI provides a more abrasive indicator than the RAI and significantly increases with the increasing equivalent quartz content. The findings of this study support a strategy for rock abrasivity assessment and tool wear prediction, which is essential in the fields of mining and georesource engineering.

Top