Acquiring News Texts about Public Security for the construction of Corpora in Portuguese

Matheus Nascimento; Vagner Silva; Gabriel Souza; Kauã Lima; Jean Turet; Victor Diogho Heuer de Carvalho; Thyago Nepomuceno

Previous Article in event

Fall detection assessment in older adults using a smart wearable device

Next Article in event

Fusion Vision Transformers and Convolutional Neural Networks for Facial Beauty Predictions

Acquiring News Texts about Public Security for the construction of Corpora in Portuguese

^*,

Victor Diogho Heuer de Carvalho

Thyago Nepomuceno

¹ Group of Engineering in Decision-Making and Artificial Intelligence, Federal University of Alagoas, Brazil

Academic Editor: Eugenio Vocaturo

Published: 04 December 2024 by MDPI in The 5th International Electronic Conference on Applied Sciences session Computing and Artificial Intelligence

Abstract:

The acquisition of texts for the purpose of composing corpora in specific domains from sources on the social web is a process that requires analyzing the structures of websites where the texts are published. This involves searching for specific fields to guide the access of responsible agents, known as scrapers. With these texts in hand, performing more refined analyses focused on tasks such as named entity recognition, text summarization, sentiment mining, and associated classifications (e.g., opinion polarities) becomes possible. This article aims to demonstrate the process of acquiring news texts in the domain of public safety in Brazil to build corpora in the Portuguese language. Since Portuguese still lacks dedicated corpora on this topic, scraping agents were developed for three initial news sources in the Northeast region, specifically in the states of Alagoas, Pernambuco, and Rio Grande do Norte. Based on these scraping agents, the corpora
were stored in a cloud-based schema for use in an ongoing research project to analyze texts related to public safety to support decision-making processes. The constituted corpus enabled the execution of multiple preliminary analyses, including the identification of crime patterns, sentiment analysis in public security reports, and the mapping of risk areas. These analyses provided valuable information that can support the formulation of public policies and the development of more effective security strategies.

Keywords: News texts; Corpora; Public Security; Text Analysis; Decision Support

View Poster

14 Reads
0 Recommendations