This research addresses the development of strategies for ethical data engineering in crime analysis, emphasizing the minimization of ethical and racial biases. The focus is on structured datasets, specifically those on hate crimes and police shootings in the United States, sourced from Kaggle. These datasets include both categorical and numerical features, making them appropriate for evaluating diverse techniques before expanding to more complex data types, such as images.
The data engineering process employs various methods to ensure fair representation and reduce bias. For data quality, techniques such as outlier detection, correlation analysis, and feature scaling are utilized to balance the distribution of sensitive attributes and minimize distortions. In the preprocessing stage, issues like missing values, incorrect labels, and potentially biased correlations are addressed. Dataset balancing is achieved through methods including SMOTE, Adaptive Synthetic Sampling, and NearMiss to manage class imbalances and ensure proportional representation. These steps are supported by fairness metrics such as disparate impact and equalized odds to continuously evaluate and refine the model outputs.
Preliminary tests were structured to evaluate the effects of different data engineering strategies on bias reduction. The application of various preprocessing and balancing methods demonstrated that systematic handling of class imbalances and feature distributions resulted in a significant decrease in model bias. Consequently, models showed improved fairness and reduced disparities across sensitive attributes, such as race and gender. These findings indicate that a robust data engineering process can positively impact the ethical performance of AI systems, mitigating issues such as discrimination and underrepresentation.
The results suggest that these strategies effectively enhance both technical and ethical standards in AI systems. Future research will expand this framework to non-categorical datasets, allowing broader applications in public security and beyond.