Textual classification for a hierarchical taxonomy of classes is a common and well-known problem associated with Large-Scale Text classifications (LSTCs). Existing approaches simply re-structure the hierarchy of classes prior to classification and have achieved better results. However, when there are many classes with an increased number of features, traditional hierarchy re-structuring tends to produce many nodes with similar granularities. This results in misclassification, and it is computationally expensive or not scalable for many classification models, especially when the hierarchy is longer. In this paper, we propose an improved hierarchy re-structuring algorithm that uses modified k-means clustering. The method uses a k-weight and backtracking, where necessary, to cluster nodes with similar granularities into a few generalized classes, reducing the number of nodes and hierarchy length as well. In addition, the proposed approach can handle overfitting, which usually occurs as a result of the unbalanced nature of LSHT datasets, where the features in each class vary extensively. Experimental results on 20NG, IPC, and DMOZ-small datasets using TD-LR and TD-SVM show that our approach can effectively improve large-scale hierarchical text classification performance over traditional and existing re-structuring approaches. In terms of scalability, our approach increases the number of scalable instances by about 10%; hence, it records the best and fastest running time.
Previous Article in event
Next Article in event
Next Article in session
Improved Taxonomy Re-structuring using Modified K-means Clustering for Efficient Large-scale Text Classification
Published:
03 December 2025
by MDPI
in The 6th International Electronic Conference on Applied Sciences
session Computing and Artificial Intelligence
Abstract:
Keywords: Hierarchical Classification; Hierarchy; Large-scale; Re-structuring,; TD-SVM; TD-LR
