Machine Learning
Sahar Abbasi; Radmin Sadeghian; Maryam Hamedi
Abstract
Multi-label classification assigns multiple labels to each instance, crucial for tasks like cancer detection in images and text categorization. However, machine learning methods often struggle with the complexity of real-life datasets. To improve efficiency, researchers have developed feature selection ...
Read More
Multi-label classification assigns multiple labels to each instance, crucial for tasks like cancer detection in images and text categorization. However, machine learning methods often struggle with the complexity of real-life datasets. To improve efficiency, researchers have developed feature selection methods to identify the most relevant features. Traditional methods, requiring all features upfront, fail in dynamic environments like media platforms with continuous data streams. To address this, novel online methods have been created, yet they often neglect optimizing conflicting objectives. This study introduces an objective search approach using mutual information, feature interaction, and the NSGA-II algorithm to select relevant features from streaming data. The strategy aims to minimize feature overlap, maximize relevance to labels, and optimize online feature interaction analysis. By applying a modified NSGA-II algorithm, a set of non-dominantsolutions is identified. Experiments on eleven datasets show that the proposed approach outperforms advanced online feature selection techniques in predictive accuracy, statistical analysis, and stability assessment.
Machine Learning
Mohammad Zahaby; Iman Makhdoom
Abstract
Breast cancer (BC) is one of the leading causes of death in women worldwide. Early diagnosis of this disease can save many women’s lives. The Breast Imaging Reporting and Data System (BIRADS) is a standard method developed by the American College of Radiology (ACR). However, physicians have had ...
Read More
Breast cancer (BC) is one of the leading causes of death in women worldwide. Early diagnosis of this disease can save many women’s lives. The Breast Imaging Reporting and Data System (BIRADS) is a standard method developed by the American College of Radiology (ACR). However, physicians have had a lot of contradictions in determining the value of BIRADS, and all aspects of patients have not been considered in diagnosing this disease using the methods that have been used so far. In this article, a novel decision support system (DSS) has been presented. In the proposed DSS, firstly, c-mean clustering was used to determine the molecular subtype for patients who did not have this value by combining the mammography reports processing along with hospital information systems (HIS) obtained from their electronic files. Then several classifiers such as convolutional neural networks (CNN), decision tree (DT), multi-level fuzzy min-max neural network (MLF), multi-class support vector machine (SVM), and XGboost were trained to determine the BIRADS. Finally, the values obtained by these classifiers were combined using weighted ensemble learning with the majority voting algorithm to obtain the appropriate value of BIRADS. This helps physicians in the early diagnosis of BC. Finally, the results were evaluated in terms of accuracy, specificity, sensitivity, positive predicted value (PPV), negative predicted value (NPV), and f1-measure by the confusion matrix. The obtained values were, 97.94%, 98.79%, 92.08%, 92.34%, 98.80%, and 92.19% respectively.
Machine Learning
Mostafa Azghandi; Mahdi Yaghoobi; Elham Fariborzi
Abstract
By focusing on the fuzzy Delphi technique (FDM), the current research introduces a novel approach to modeling Persian vernacular architecture. Fuzzy Delphi is a more advanced version of the Delphi Method, which utilizes triangulation statistics to determine the distance between the levels of consensus ...
Read More
By focusing on the fuzzy Delphi technique (FDM), the current research introduces a novel approach to modeling Persian vernacular architecture. Fuzzy Delphi is a more advanced version of the Delphi Method, which utilizes triangulation statistics to determine the distance between the levels of consensus within the expert panel and deals with the measurement uncertainty of qualitative data. In this sense, the main objective of the Delphi method is to acquire the most reliable consensus of a group of expert opinions; an advantage that helps the current study to answer the main question of the research, that is, determining the efficacy of fuzzy Delphi technique in intelligent modeling of Persian vernacular architecture. Therefore, in order to identify the main factors of the research model, systematic literature reviews as well as semi-structured interviews with experts were conducted. Then, with the usage of Qualitative Content Analysis (QCA), various themes were obtained and employed as the main factors of the research model. Finally, by utilizing the fuzzy Delphi technique, the present study examined the degree of certainty and accuracy of the factors in two stages and identified 28 factors in the modeling of Persian vernacular architecture.
Machine Learning
Morteza Amini; Kiana Ghasemifard
Abstract
The diabetes data set gathered by Michael Kahn, at Washington University, St. Louis, MO, which is available online at UCI machine learning repository is one of the rarely used data sets, specially for glucose prediction purposes in diabetic patients. In this paper, we study the problem of blood glucose ...
Read More
The diabetes data set gathered by Michael Kahn, at Washington University, St. Louis, MO, which is available online at UCI machine learning repository is one of the rarely used data sets, specially for glucose prediction purposes in diabetic patients. In this paper, we study the problem of blood glucose range prediction, rather than raw glucose prediction, along with two other important tasks, which are the detection of increment or decrement of glucose as well as abnormal value prediction, based on regular and NPH insulin doses, based on this data set. Two commonly used machine learning approaches for time series data, namely LSTM and CNN are used along with a promising statistical regression approach, that is non-parametric multivariate Gaussian additive mixed model, for the prediction task. It is observed that, although LSTM and CNN models are preferable concerning the prediction error, the statistical method performs significantly better in the sense of abnormal value detection, which is a critical task for diabetic patients.
Machine Learning
Seyyed Mousa Khademi; Abbas Shams Vala; Somayyeh Jafari
Abstract
The purpose of this research is to explain the application of business intelligence in managing knowledge assets, utilizing the co-word analysis technique on scientific productions related to "knowledge assets management and business intelligence". In this applied research, the method of content analysis ...
Read More
The purpose of this research is to explain the application of business intelligence in managing knowledge assets, utilizing the co-word analysis technique on scientific productions related to "knowledge assets management and business intelligence". In this applied research, the method of content analysis and the techniques of co-word analysis, social network analysis, hierarchical clustering, and strategic diagram have been used. The research community is 929 scientific productions related to "business intelligence and knowledge management" from the 1990s to 2022 in the Web of Science database. Data analysis was conducted using Histcite, BibExcel, UCINET, and Excel software, while the maps were created using VOS Viewer and SPSS software. The results indicated that the average annual growth rates for publication and production impact were 28% and 8.9%, respectively. Among the keywords, "big data," "data mining," and "data warehouse," as well as "big data," "management," and "system," and "design science," "Industry 4.0," and "discovery" exhibited the highest frequency, links, and citations, respectively. Co-word analysis resulted in the formation of eight clusters comprising a total of 138 keywords. In hierarchical clustering, five clusters—namely, business intelligence tools in knowledge management, infrastructures and technologies of business intelligence, and business process management through the management of knowledge assets—are considered mature and are positioned at the center of this research field. This research provides a comprehensive perspective by identifying the main topics and clusters discussed in the fields of business intelligence and knowledge management. It can be valuable for researchers, educators, policymakers, and organizational managers.
Machine Learning
Negin Bagherpour; Behrang Ebrahimi
Abstract
Feature selection is crucial to improve the quality of classification and clustering. It aims to enhance machine learning performance and reduce computational costs by eliminating irrelevant or redundant features. However, existing methods often overlook intricate feature relationships and select redundant ...
Read More
Feature selection is crucial to improve the quality of classification and clustering. It aims to enhance machine learning performance and reduce computational costs by eliminating irrelevant or redundant features. However, existing methods often overlook intricate feature relationships and select redundant features. Additionally, dependencies are often hidden or inadequately identified. That’s mainly because of nonlinear relationships being used in traditional algorithms. To address these limitations, novel feature selection algorithms are needed to consider intricate feature relationships and capture high-order dependencies, improving the accuracy and efficiency of data analysis.In this paper, we introduce an innovative feature selection algorithm based on Adjacency Matrix, which is applicable to supervised data. The algorithm comprises three steps for identifying pertinent features. In the first step, the correlation between each feature and its corresponding class is measured to eliminate irrelevant features. Moving to the second step, the algorithm focuses on the selected features, calculates pairwise relationships and constructs an adjacency matrix. Finally, the third step employs clustering techniques to classify the adjacency matrix into k clusters, where k represents the number of desired features. From each cluster, the algorithm selects the most representative feature for subsequent analysis.This feature selection algorithm provides a systematic approach to identify relevant features in supervised data, thereby significantly enhance the efficiency and accuracy of data analysis. By taking into account both the linear and nonlinear dependencies between features and effectively detecting them across multiple feature sets, it successfully overcomes the limitations of previous methods.