Machine Learning
Sahar Abbasi; Radmin Sadeghian; Maryam Hamedi
Abstract
Multi-label classification assigns multiple labels to each instance, crucial for tasks like cancer detection in images and text categorization. However, machine learning methods often struggle with the complexity of real-life datasets. To improve efficiency, researchers have developed feature selection ...
Read More
Multi-label classification assigns multiple labels to each instance, crucial for tasks like cancer detection in images and text categorization. However, machine learning methods often struggle with the complexity of real-life datasets. To improve efficiency, researchers have developed feature selection methods to identify the most relevant features. Traditional methods, requiring all features upfront, fail in dynamic environments like media platforms with continuous data streams. To address this, novel online methods have been created, yet they often neglect optimizing conflicting objectives. This study introduces an objective search approach using mutual information, feature interaction, and the NSGA-II algorithm to select relevant features from streaming data. The strategy aims to minimize feature overlap, maximize relevance to labels, and optimize online feature interaction analysis. By applying a modified NSGA-II algorithm, a set of non-dominantsolutions is identified. Experiments on eleven datasets show that the proposed approach outperforms advanced online feature selection techniques in predictive accuracy, statistical analysis, and stability assessment.
Machine Learning
Negin Bagherpour; Behrang Ebrahimi
Abstract
Feature selection is crucial to improve the quality of classification and clustering. It aims to enhance machine learning performance and reduce computational costs by eliminating irrelevant or redundant features. However, existing methods often overlook intricate feature relationships and select redundant ...
Read More
Feature selection is crucial to improve the quality of classification and clustering. It aims to enhance machine learning performance and reduce computational costs by eliminating irrelevant or redundant features. However, existing methods often overlook intricate feature relationships and select redundant features. Additionally, dependencies are often hidden or inadequately identified. That’s mainly because of nonlinear relationships being used in traditional algorithms. To address these limitations, novel feature selection algorithms are needed to consider intricate feature relationships and capture high-order dependencies, improving the accuracy and efficiency of data analysis.In this paper, we introduce an innovative feature selection algorithm based on Adjacency Matrix, which is applicable to supervised data. The algorithm comprises three steps for identifying pertinent features. In the first step, the correlation between each feature and its corresponding class is measured to eliminate irrelevant features. Moving to the second step, the algorithm focuses on the selected features, calculates pairwise relationships and constructs an adjacency matrix. Finally, the third step employs clustering techniques to classify the adjacency matrix into k clusters, where k represents the number of desired features. From each cluster, the algorithm selects the most representative feature for subsequent analysis.This feature selection algorithm provides a systematic approach to identify relevant features in supervised data, thereby significantly enhance the efficiency and accuracy of data analysis. By taking into account both the linear and nonlinear dependencies between features and effectively detecting them across multiple feature sets, it successfully overcomes the limitations of previous methods.