The acquisition of texts for the purpose of composing corpora in specific domains from sources on the social web is a process that requires analyzing the structures of websites where the texts are published. This involves searching for specific fields to guide the access of responsible agents, known as scrapers. With these texts in hand, performing more refined analyses focused on tasks such as named entity recognition, text summarization, sentiment mining, and associated classifications (e.g., opinion polarities) becomes possible. This article aims to demonstrate the process of acquiring news texts in the domain of public safety in Brazil to build corpora in the Portuguese language. Since Portuguese still lacks dedicated corpora on this topic, scraping agents were developed for three initial news sources in the Northeast region, specifically in the states of Alagoas, Pernambuco, and Rio Grande do Norte. Based on these scraping agents, the corpora
were stored in a cloud-based schema for use in an ongoing research project to analyze texts related to public safety to support decision-making processes. The constituted corpus enabled the execution of multiple preliminary analyses, including the identification of crime patterns, sentiment analysis in public security reports, and the mapping of risk areas. These analyses provided valuable information that can support the formulation of public policies and the development of more effective security strategies.
Previous Article in event
Next Article in event
Acquiring News Texts about Public Security for the construction of Corpora in Portuguese
Published:
04 December 2024
by MDPI
in The 5th International Electronic Conference on Applied Sciences
session Computing and Artificial Intelligence
Abstract:
Keywords: News texts; Corpora; Public Security; Text Analysis; Decision Support
Comments on this paper