Feature Selection in High Dimensional Datasets based on Adjacency Matrix

Bagherpour, Negin; Ebrahimi, Behrang

doi:10.22054/jdsm.2024.81171.1052

Document Type : original

Authors

Negin Bagherpour ¹
Behrang Ebrahimi ²

¹ Department of Engineering Sciences, Faculty of Engineering, University of Tehran, Tehran, Iran

² Department Of Engineering Sciences, University Of Tehran

https://doi.org/10.22054/jdsm.2024.81171.1052

Abstract

Feature selection is crucial to improve the quality of classification and clustering. It aims to enhance machine learning performance and reduce computational costs by eliminating irrelevant or redundant features. However, existing methods often overlook intricate feature relationships and select redundant features. Additionally, dependencies are often hidden or inadequately identified. That’s mainly because of nonlinear relationships being used in traditional algorithms. To address these limitations, novel feature selection algorithms are needed to consider intricate feature relationships and capture high-order dependencies, improving the accuracy and efficiency of data analysis.
In this paper, we introduce an innovative feature selection algorithm based on Adjacency Matrix, which is applicable to supervised data. The algorithm comprises three steps for identifying pertinent features. In the first step, the correlation between each feature and its corresponding class is measured to eliminate irrelevant features. Moving to the second step, the algorithm focuses on the selected features, calculates pairwise relationships and constructs an adjacency matrix. Finally, the third step employs clustering techniques to classify the adjacency matrix into k clusters, where k represents the number of desired features. From each cluster, the algorithm selects the most representative feature for subsequent analysis.
This feature selection algorithm provides a systematic approach to identify relevant features in supervised data, thereby significantly enhance the efficiency and accuracy of data analysis. By taking into account both the linear and nonlinear dependencies between features and effectively detecting them across multiple feature sets, it successfully overcomes the limitations of previous methods.

Keywords

Main Subjects

Machine Learning

Journal of Data Science and Modeling

Feature Selection in High Dimensional Datasets based on Adjacency Matrix

Volume 2, Issue 1 - Serial Number 3
December 2023
Pages 209-218

Feature Selection in High Dimensional Datasets based on Adjacency Matrix

Volume 2, Issue 1 - Serial Number 3December 2023Pages 209-218

Volume 2, Issue 1 - Serial Number 3
December 2023
Pages 209-218