Please login first

List of accepted submissions

 
 
Show results per page
Find papers
 
  • Open access
  • 8 Reads
Automatic Classification of Legal Cases: A Comparative Study of AutoGluon and the Gemini Family

The growing volume of litigation in the Brazilian judiciary imposes significant challenges to procedural speed and efficiency. One critical bottleneck lies in the initial classification of petitions, where the correct assignment of the procedural "Class", as defined by the Unified Procedural Tables (TPU) of the National Council of Justice (CNJ), is essential for the subsequent procedural flow. Errors at this stage lead to rework and delays. This article investigates the potential of Artificial Intelligence to mitigate this issue by presenting a comparative analysis of the performance of two distinct technological approaches: the Automated Machine Learning (AutoML) framework AutoGluon and a suite of Large Language Models (LLMs) from Google’s Gemini and Gemma families. Using a private and robust dataset of 27,000 initial petitions from the Court of Justice of the State of Amazonas (TJ-AM), distributed across nine procedural classes, the models were evaluated in a zero-shot scenario simulating implementation with minimal configuration effort. The results for both technologies demonstrate remarkable feasibility. AutoGluon, leveraging the full dataset, achieved a performance ceiling of 95% accuracy. Impressively, the LLMs, evaluated on a smaller sample without specific training, delivered highly competitive results, with Gemini-2.0-flash reaching 94% and Gemma-3-27b-it achieving 93% accuracy. The study concludes that "out-of-the-box" AI solutions are promising tools for assisting lawyers and judicial staff, with the potential to improve classification accuracy, streamline workflows, and contribute to greater efficiency in the delivery of justice.

  • Open access
  • 6 Reads
Performance Evaluation of Stop Word Influence in Hausa Extractive Summarization using Enhanced TextRank
, , , , , ,

Stop word removal is a fundamental preprocessing step in Natural Language Processing (NLP), aiming to eliminate non-informative words that may degrade the performance of downstream tasks. While its impact has been widely explored in high-resource languages like English, its effectiveness in low-resource languages such as Hausa remains under-investigated, particularly in the context of extractive text summarization. This study addresses that gap by examining the role of stop word removal in enhancing summarization performance for Hausa academic texts. We introduce a new benchmark dataset specifically curated for Hausa extractive summarization, composed of academic abstracts. Furthermore, we propose an enhanced variant of the TextRank algorithm that leverages a combination of sentence-level features including positional weight, lexical similarity, and semantic similarity to compute edge weights in the sentence similarity graph. This feature-rich graph structure allows for a more context-aware sentence ranking process. The proposed model is evaluated against standard baselines, namely, TextRank and LexRank, using the ROUGE evaluation metric. Experimental results demonstrate that our method significantly outperforms the baselines across ROUGE-1, ROUGE-2, and ROUGE-L scores. Additionally, ablation studies with and without a tailored Hausa stop word list reveal a notable performance gain when stop words are removed. These findings highlight the importance of language-specific preprocessing strategies in improving NLP outcomes for low-resource languages.

  • Open access
  • 12 Reads
Hybrid VGG19-TCN with Multi-Channel Temporal Attention for Phishing Attack Detection
, , , , ,

Phishing attacks continue to be a constant and evolving menace in cyberspace, taking advantage of users' confidence to gain access to private data. To improve the detection of phishing attacks, this research study proposes a novel hybrid architecture that combines the robust spatial feature extraction capabilities of VGG19 with the temporal sequence modelling advantage of a Temporal Convolutional Network (TCN), enhanced with a Multi-Channel Temporal Attention Mechanism. The TCN temporarily captures the deep spatial information that the VGG19 network extracts from embedded URLs and email content to detect time-based attack patterns and sequential dependencies. The model can focus on the most discriminative temporal features, especially in imbalanced datasets, based on the Multi-Channel Temporal Attention Module, which dynamically weights temporal features across different data streams. The proposed Hybrid VGG19–TCN model with Multi-Channel Temporal Attention outperforms conventional CNN-RNN, CNN-LSTM, CNN-GRU, CNN, LSTM, GRU, TCN, and BiLSTM models and other baseline machine learning classifiers regarding accuracy, recall, and AUC, according to an experimental evaluation on three benchmark phishing datasets. The experiment results demonstrate that the proposed model is a more robust and precise solution for detecting advanced phishing attacks than state-of-the-art models, and it can be deployed in real-time phishing detection systems.

  • Open access
  • 7 Reads
Improved Taxonomy Re-structuring using Modified K-means Clustering for Efficient Large-scale Text Classification
, , , , ,

Textual classification for a hierarchical taxonomy of classes is a common and well-known problem associated with Large-Scale Text classifications (LSTCs). Existing approaches simply re-structure the hierarchy of classes prior to classification and have achieved better results. However, when there are many classes with an increased number of features, traditional hierarchy re-structuring tends to produce many nodes with similar granularities. This results in misclassification, and it is computationally expensive or not scalable for many classification models, especially when the hierarchy is longer. In this paper, we propose an improved hierarchy re-structuring algorithm that uses modified k-means clustering. The method uses a k-weight and backtracking, where necessary, to cluster nodes with similar granularities into a few generalized classes, reducing the number of nodes and hierarchy length as well. In addition, the proposed approach can handle overfitting, which usually occurs as a result of the unbalanced nature of LSHT datasets, where the features in each class vary extensively. Experimental results on 20NG, IPC, and DMOZ-small datasets using TD-LR and TD-SVM show that our approach can effectively improve large-scale hierarchical text classification performance over traditional and existing re-structuring approaches. In terms of scalability, our approach increases the number of scalable instances by about 10%; hence, it records the best and fastest running time.

  • Open access
  • 10 Reads
PLA membranes as functional alternatives to PVC in potentiometric pH electrodes for wine and beverages.

The replacement of conventional polymers with safer and more sustainable materials is an urgent challenge in sensor development, particularly for applications in food analysis. Beyond sustainability, new materials must also ensure food compatibility, reduce plastic waste, and provide reliable analytical performance. In this context, polylactic acid (PLA) was evaluated as a polymeric matrix for potentiometric pH electrodes and compared with traditional PVC membranes and a commercial glass electrode. PLA and PVC membranes were prepared by solvent casting, conditioned, and subsequently tested in representative agro-food matrices such as wines, juices, soft drinks, and vinegars. Sensitivity, linearity, accuracy, matrix effects, reproducibility, repeatability, electrical resistance and lifetime were systematically assessed

PLA membranes exhibited lower sensitivity (-19 ± 2 mV/pH) than PVC (-52 ± 3 mV/pH); however, they achieved comparable accuracy to the reference system, with a global bias of -0.094 ± 0.089 pH across all matrices. The reduced slope of PLA minimized matrix interferences, especially under acidic conditions and in the presence of ethanol. The membranes exhibited low electrical resistance, good repeatability and reproducibility, and an operational lifetime of approximately 15 days even under aggressive storage conditions.

Overall, PLA demonstrated its potential as a food-compatible and biodegradable alternative to PVC in potentiometric sensors. By combining accuracy comparable to glass electrodes with enhanced resistance to matrix effects and the additional benefit of reduced plastic waste, PLA emerges as a strong candidate for next-generation enological sensors, with promising prospects for integration into additive manufacturing technologies without the need for solvent processing.

  • Open access
  • 6 Reads
An Improved Breast Cancer Classification Using Ensemble Learning and Data Resampling Techniques: A Machine Learning Approach
, , ,

Breast cancer remains a major global health challenge and is one of the leading causes of mortality among women worldwide. Despite significant advancements in diagnostic imaging and clinical assessment methods, traditional detection approaches often suffer from high false-positive rates and considerable diagnostic subjectivity. These limitations can delay early treatment and increase patient anxiety. To address these challenges, this study presents a machine learning-based framework aimed at improving breast cancer classification using structured clinical data.The proposed system utilizes an ensemble learning model, specifically the Extreme Gradient Boosting (XGBoost) classifier, known for its accuracy and speed in predictive modeling tasks. To address the issue of class imbalance and improve sensitivity to malignant cases, the Synthetic Minority Over-sampling Technique combined with Tomek Links (SMOTE Tomek) was employed. Furthermore, feature importance analysis using the Random Forest algorithm was conducted to identify the most relevant clinical variables, thereby enhancing the model’s interpretability and computational efficiency.Evaluation of the model yielded promising results, with an accuracy of 94.03%, precision of 91.89%, and an AUC score of 97.0 These metrics indicate that the model is robust and highly effective in classifying breast cancer cases. The findings underscore the potential of integrating advanced machine learning techniques into healthcare workflows, offering a more accurate, consistent, and early diagnostic aid for breast cancer. The study supports the growing role of data-driven solutions in enhancing clinical decision-making and improving patient outcomes.

  • Open access
  • 6 Reads
Tentative MALDI-TOF MS Lipid Profiling in Honey and Bee Pollen Samples

Bee products are increasingly demanded and valued for their nutritional properties and health benefits, which are directly related to the bioactive compounds they contain. However, this rising demand has been accompanied by widespread adulteration and mislabeling, which lead to illegal competition among producers and compromise product quality. Thus, the authentication of botanical and geographical origin is becoming increasingly important when it comes to marketing. Reliable analytical tools are therefore essential to determine the composition of bee products in order not only to enhance their nutritional value and health benefits but to ensure their traceability and authenticity. While several classes of bioactive compounds, such as amino acids, betaines and glucosinolates, have been previously studied and proposed as potential markers of their authenticity, lipids remain relatively unexplored. We present the development and optimization of a rapid and practical methodology for lipid profiling of honey and bee pollen. Solid–liquid extraction was combined with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS), enabling broad lipid coverage with minimal sample handling. Optimization of instrumental parameters and sample treatment led to reproducible lipid profiles in both matrices. The proposed method was applied to 15 honey and 13 bee pollen samples of diverse botanical and geographical origins collected across Spain. More than 700 lipids were tentatively identified across all samples, including the five main families: fatty acyls, glycerolipids, glycerophospholipids, sphingolipids, and sterol lipids. Principal component analysis (PCA) revealed clear clustering of the samples, allowing their classification according to both botanical and geographical origin.

  • Open access
  • 12 Reads
Enhancing Aerial Surveillance through an Intelligent Drone Patrolling System Leveraging Real-Time Object Detection and GPS-Based Path Adaptation
, , , ,

Background: The demand for autonomous and intelligent surveillance systems has grown due to rising security concerns and the need for efficient area monitoring. Traditional manual methods lack scalability and responsiveness. Unmanned aerial vehicles (UAVs), combined with AI-driven computer vision, offer a transformative solution for real-time surveillance and threat detection. Objective: This study proposes the design and development of an autonomous drone patrolling system capable of real-time object and human detection, dynamic route adaptation, and autonomous navigation to improve surveillance efficiency. Methods: The system integrates the DroneKit Python library for high-level autonomous control and an STM32 microcontroller for low-level flight management. A live video stream from the onboard camera is processed using the MobileNet-SSD (Single Shot MultiBox Detector) deep learning model for real-time object classification. Predefined GPS waypoints guide the patrol route, while dynamic path adjustments occur in response to detected activity. Core components include OpenCV for image processing, GPS and telemetry modules for localization and communication, and an embedded control system for flight operations. Results: The implementation demonstrates effective autonomous patrolling and object detection with responsive behavior to environmental inputs. MobileNet-SSD offers reliable, low-latency classification with efficient resource usage. GPS-based path adaptation enables accurate re-routing upon human or object detection. Conclusions: The proposed framework delivers a scalable, intelligent solution for autonomous surveillance. By integrating deep learning with GPS control and embedded systems, the model enhances situational awareness for applications in security, emergency response, and area monitoring.

  • Open access
  • 48 Reads
An Automated Medical Diagnosis System for Neoplasm Medical (MRI) Image Classification using Supervised and Unsupervised Techniques

In this research, an improved automated medical prediction system, namely, the Neoplasm Medical (MRI) Image Classification System ( NMICS), is proposed. The proposed (NMICS) system aimed to robotically identify the test medical (MRI) image, which s grouped into the neoplasm (Tumor) or non-neoplasm (non-tumor) group, respectively, using machine learning techniques. The proposed (NMICS) system is divided into two stages, namely, the Train Medical (MRI) Image Model (TMIM) and Medical Image Predication Stage (MIPS), respectively. In the TMIM stage, the NMICS system is performing various distinct operations including 1) improving input medical (MRI) image data set quality and consistency through standard arithmetic operations, 2) extracting the specific features (edge) from every individual medical image in the input MRI image data set using the CNN method and 3) separating the feature vector set of the input MRI image data set into two distinct clusters, namely, Tumor and Non-Tumor, respectively, using the unsupervised k-means clustering technique. In the MIPS stage, the NMICS system is performing the same types of operations over the test medical image samples, which are followed in the TMIM stage excluding training operation. Next, the NMICS system maps and classifies the feature vector of the test medical image sample with trained medical image data set clusters using a KNN classifier. The investigation results show that the NMICS system is well suited to diagnosis whether the given MRI image is grouped into the neoplasm category or non-neoplasm group.

  • Open access
  • 6 Reads
An Improved Graph-Based Method for Hausa Text Single-Document Summary Extraction Using a Hybrid Similarity Function
, , , , ,

Extractive text summarization is a technique that automatically generates a concise version of a document by selecting and rearranging its most important sentences verbatim. This paper proposes an improved graph-based method for Hausa single-document extractive summarization. The improvement is achieved through the use of a hybrid similarity function, created by first evaluating the performance of four distinct similarity measures individually within a ranking algorithm. These measures are cosine similarity, Jaccard similarity, the overlap coefficient, and n-gram/Dice’s coefficient similarity. Three other similarity measures were then each combined with the n-gram/Dice’s coefficient similarity using the simple harmonic mean to form hybrid similarity functions. To evaluate the effectiveness of the proposed method, the Hausa extractive text summarization corpus was used. Performance was assessed using standard evaluation metrics, including precision, recall, and F-score. Among the tested combinations, cosine similarity combined with n-gram/Dice’s coefficient similarity yielded the best performance. It achieved F-score values of 0.8085 for ROUGE-1, 0.3705 for ROUGE-2, and 0.6946 for ROUGE-L, outperforming the other similarity pairings. These results demonstrate that integrating cosine similarity with n-gram/Dice’s coefficient similarity significantly enhances the performance of graph-based extractive summarization for Hausa text. This study contributes to the advancement of natural language processing tools for under-resourced languages like Hausa and provides a foundation for further development in multi-lingual text summarization systems.

Top