Machine Learning
Sahar Abbasi; Radmin Sadeghian; Maryam Hamedi
Abstract
Multi-label classification assigns multiple labels to each instance, crucial for tasks like cancer detection in images and text categorization. However, machine learning methods often struggle with the complexity of real-life datasets. To improve efficiency, researchers have developed feature selection ...
Read More
Multi-label classification assigns multiple labels to each instance, crucial for tasks like cancer detection in images and text categorization. However, machine learning methods often struggle with the complexity of real-life datasets. To improve efficiency, researchers have developed feature selection methods to identify the most relevant features. Traditional methods, requiring all features upfront, fail in dynamic environments like media platforms with continuous data streams. To address this, novel online methods have been created, yet they often neglect optimizing conflicting objectives. This study introduces an objective search approach using mutual information, feature interaction, and the NSGA-II algorithm to select relevant features from streaming data. The strategy aims to minimize feature overlap, maximize relevance to labels, and optimize online feature interaction analysis. By applying a modified NSGA-II algorithm, a set of non-dominantsolutions is identified. Experiments on eleven datasets show that the proposed approach outperforms advanced online feature selection techniques in predictive accuracy, statistical analysis, and stability assessment.
Statistical Simulation
Vadood keramati; Ramin Sadeghian; Maryam Hamedi; Ashkan Shabbak
Abstract
Record linkage is a tool used to gather information and data from different sources. It is used in activities related to government, such as e-government and the production of register-based data. This method compares the strings in the databases and there are different methods for record linkage, such ...
Read More
Record linkage is a tool used to gather information and data from different sources. It is used in activities related to government, such as e-government and the production of register-based data. This method compares the strings in the databases and there are different methods for record linkage, such as deterministic and probabilistic assumption. This paper presents a proposed expert system for record linkage of data received from multiple databases. The system is designed to save time and reduce errors in the process of aggregating data. The inputs for this system include several linked fields, thresholds, and metric methods, which are explained along with the evaluation of the used method. To validate the system, inputs from two databases and seven information fields, comprising 100,000 simulated records, were used. The results reveal a higher accuracy of possible record linkage compared to deterministic records. Furthermore, the highest linkage was achieved using five fields with varying thresholds. In assessing the various metric methods, it was found that metric methods with less than 80% accuracy and the Winkler metric method with over 86% accuracy were utilized. These findings demonstrate that the implementation of the proposed automated system significantly saves time and enhances the flexibility of selection methods.