Please login first
Acquiring News Texts about Public Security for the construction of Corpora in Portuguese
, , , , * , ,
1  Group of Engineering in Decision-Making and Artificial Intelligence, Federal University of Alagoas, Brazil
Academic Editor: Eugenio Vocaturo

Abstract:

The acquisition of texts for the purpose of composing corpora in specific domains from sources on the social web is a process that requires analyzing the structures of websites where the texts are published. This involves searching for specific fields to guide the access of responsible agents, known as scrapers. With these texts in hand, performing more refined analyses focused on tasks such as named entity recognition, text summarization, sentiment mining, and associated classifications (e.g., opinion polarities) becomes possible. This article aims to demonstrate the process of acquiring news texts in the domain of public safety in Brazil to build corpora in the Portuguese language. Since Portuguese still lacks dedicated corpora on this topic, scraping agents were developed for three initial news sources in the Northeast region, specifically in the states of Alagoas, Pernambuco, and Rio Grande do Norte. Based on these scraping agents, the corpora
were stored in a cloud-based schema for use in an ongoing research project to analyze texts related to public safety to support decision-making processes. The constituted corpus enabled the execution of multiple preliminary analyses, including the identification of crime patterns, sentiment analysis in public security reports, and the mapping of risk areas. These analyses provided valuable information that can support the formulation of public policies and the development of more effective security strategies.

Keywords: News texts; Corpora; Public Security; Text Analysis; Decision Support
Comments on this paper
Currently there are no comments available.



 
 
Top