
تعداد نشریات | 21 |
تعداد شمارهها | 610 |
تعداد مقالات | 9,028 |
تعداد مشاهده مقاله | 67,082,852 |
تعداد دریافت فایل اصل مقاله | 7,656,347 |
Feature selection method based on clustering technique and optimization algorithm | ||
International Journal of Nonlinear Analysis and Applications | ||
مقاله 21، دوره 15، شماره 9، آذر 2024، صفحه 271-287 اصل مقاله (528.82 K) | ||
نوع مقاله: Research Paper | ||
شناسه دیجیتال (DOI): 10.22075/ijnaa.2023.30515.4421 | ||
نویسندگان | ||
Sara Dehghani؛ Razieh Mlekhosseini* ؛ Karamollah Bagherifard؛ S. Hadi Yaghoubian | ||
Department of Computer Engineering, Yasuj Branch, Islamic Azad University, Yasuj, Iran | ||
تاریخ دریافت: 21 بهمن 1401، تاریخ بازنگری: 11 اردیبهشت 1402، تاریخ پذیرش: 11 خرداد 1402 | ||
چکیده | ||
Data platforms with large dimensions, despite the opportunities they create, create many computational challenges. One of the problems of data with large dimensions is that most of the time, all the characteristics of the data are not important and vital to finding the knowledge that is hidden in them. These features can have a negative effect on the performance of the classification system. An important technique to overcome this problem is feature selection. During the feature selection process, a subset of primary features is selected by removing irrelevant and redundant features. In this article, a hierarchical algorithm based on the coverage solution will be presented, which selects effective features by using relationships between features and clustering techniques. This new method is named GCPSO, which is based on the optimization algorithm and selects the appropriate features by using the feature clustering technique. The feature clustering method presented in this article is different from previous algorithms. In this method, instead of using traditional clustering models, final clusters are formed by using the graphic structure of features and relationships between features. The UCI database has been used to evaluate the proposed method due to its extensive characteristics. The efficiency of the proposed model has also been compared with the feature selection methods based on the coverage solution that uses evolutionary algorithms in the feature selection process. The obtained results indicate that the proposed method has performed well in terms of choosing the optimal subset and classification accuracy on all data sets and in comparison with other methods. | ||
کلیدواژهها | ||
feature selection؛ optimization algorithms؛ hierarchical algorithm؛ graph clustering | ||
مراجع | ||
[1] M. Abdel-Basset, D. El-Shahat, I. El-Henawy, V.H.C. De Albuquerque, and S. Mirjalili, A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection, Expert Syst. Appl. 139 (2020), 112824. [2] M.H. Aghdam, N. Ghasem-Aghaee, and M.E. Basiri, Text feature selection using ant colony optimization, Expert Syst. Appl. 36 (2009), no. 3, 6843–6853. [3] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Nat. Acad. Sci. USA 96 (1999), 6745–6750. [4] F. Amini and G. Hu, A two-layer feature selection method using Genetic Algorithm and Elastic Net, Expert Syst. Appl. 166 (2021), 114072. [5] A. Asuncion and D. Newman, UCI repository of machine learning datasets, Available from: !http://archive.ics.uci.edu/ml/datasets.php., 2007. [6] S.R. Bandela and T.K. Kumar, Unsupervised feature selection and NMF de-noising for robust Speech Emotion Recognition, Appl. Acoustics 172 (2021), 107645. [7] S. Bandyopadhyay, T. Bhadra, P. Mitra, and U. Maulik, Integration of dense subgraph finding with feature clustering for unsupervised feature selection, Pattern Recog. Lett. 40 (2014), 104–112. [8] V.D. Blondel, J.L. Guillaume, R. Lambiotte, and E. Lefebvre, Fast unfolding of communities in large networks, J. Statist. Mech.: Theory Experiment 10008 (2008), 1–12. [9] J.M. Cadenas, M.C. Garrido, and R. Mart´ınez, Feature subset selection Filter-Wrapper based on low quality data, Expert Syst. Appl. 40 (2013), no. 16, 6241–6252. [10] L. Carmen, M. Reinders, and L. Wessels, Random subspace method for multivariate feature selection, Pattern Recog. Lett. 27 (2006), no. 10, 067–1076. [11] G. Chandrashekar and F. Sahin, A survey on feature selection methods, Comput. Electric. Engin. 40 (2014), no. 1, 16–28. [12] A.K. Farahat, A. Ghodsi, and M.S. Kamel, Efficient greedy feature selection for unsupervised learning, Knowledge Inf. Syst. 35 (2013), no. 2, 285–310. [13] I. Guyon and A.E. Elisseeff, An introduction to variable and feature selection, J. Machine Learn. Res. 3 (2003), 1157–1182. [14] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learn. 46 (2002), no. 1, 389–422. [15] E. Hancer, A new multi-objective differential evolution approach for simultaneous clustering and feature selection, Engin. Appl. Artific. Intell. 87 (2020), 103307. [16] S.M. Hazrati Fard, A. Hamzeh, and S. Hashemi, Using reinforcement learning to find an optimal set of features, Comput. Math. Appl. 66 (2013), no. 10, 1892–1904. [17] H. Liu and L. Yu, Toward integrating feature selection algorithms for classification and clustering, IEEE Trans. Knowledge Data Engin. 17 (2005), no. 4, 491–502. [18] Y. Liu and Y.F. Zheng, FS-SFS: A novel feature selection method for support vector machines, Pattern Recog. 39 (2006), no. 7, 1333–1345. [19] J. Kennedy and R. Eberhart, Particle swarm optimization, Proc. ICNN’95-Int. Conf. Neural Networks, IEEE, 1995, pp. 1942–1948. [20] J. Kim, F.J. Kohout, N.H. Nie, C.H. Hull, J.G. Jenkins, K. Steinbrenner, and D.H. Bent, Statistical Package for the Social Sciences, McGraw Hill, New York NY, 1975. [21] N. Maleki, Y. Zeinali, and S.T.A. Niaki, A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection, Expert Syst. Appl. 164 (2021), 113981. [22] P. Nimbalkar and D. Kshirsagar, Feature selection for intrusion detection system in Internet-of-Things (IoT), ICT Express 7 (2021), no. 2, 177–181. [23] M. Paniri, M.B. Dowlatshahi, and H. Nezamabadi-Pour, MLACO: A multi-label feature selection algorithm based on ant colony optimization, Knowledge-Based Syst. 192 (2020), 105285. [24] R. Pascual-Marqui, D. Lehmann, K. Kochi, T. Kinoshita, and N. Yamada, A measure of association between vectors based on “similarity covariance”, 2013-01-21, arXiv: 1301.4291 [stat.ME]. http://arxiv.org/abs/1301.4291. [25] G. Quanquan, L. Zhenhui, and J. Han, Generalized Fisher score for feature selection, Proc. Int. Conf. Uncertainty Artificial Intell., 2011. [26] L.E. Raileanu and K. Stoffel, Theoretical comparison between the Gini index and information gain criteria, Ann. Math. Artif. Intell. 41 (2004), 77–93. [27] M. Rostami, K. Berahmand, and S. Forouzandeh, A novel community detection based genetic algorithm for feature selection, J. Big Data 8 (2021), no. 1, 1–27. [28] M. Rostami, K. Berahmand, E. Nasiri, and S. Forouzandeh, Review of swarm intelligence-based feature selection methods, Engin. Appl. Artific. Intell. 100 (2021), 104210. [29] R. Ruiz, J.C. Riquelme, J.S. Aguilar-Ruiz, and M. Garc´ıa-Torres, Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches, Expert Syst. Appl. 39 (2012), 11094–11102. [30] Y. Saeys, I. Inza, and P. Larranaga, A review of feature selection techniques in bioinformatics, Bioinformatics 23 (2007), no. 19, 2507–2517. [31] M. Sharif, J. Amin, M. Raza, M. Yasmin, and S.C. Satapathy, An integrated design of particle swarm optimization (PSO) with fusion of features for detection of brain tumor, Pattern Recog. Lett. 129 (2020), 150–157. [32] C. Shi, Z. Gu, C. Duan, and Q. Tian, Multi-view adaptive semi-supervised feature selection with the self-paced learning, Signal Process. 168 (2020), 107332. [33] Q. Song, J. Ni, and G. Wang, A fast clustering-based feature subset selection algorithm for high-dimensional data, IEEE Trans. Knowledge Data Engin. 25 (2013), no. 1, 1–14. [34] X. Sun, Y. Liu, J. Li, J. Zhu, H. Chen, and X. Liu, Feature evaluation and selection with cooperative game theory, Pattern Recog. 45 (2012), no. 8, 2992–3002. [35] S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic Press, Oxford, 2008. [36] S. Theodoridis and C. Koutroumbas, Pattern Recognition, 4th Edn, Elsevier Inc, 2009. [37] D. Wang, Z. Zhang, R. Bai, and Y. Mao, A hybrid system with filter approach and multiple population genetic algorithm for feature selection in credit scoring, J. Comput. Appl. Math. 329 (2018), 307–321. [38] H. Xiaofei, C. Deng, and P. Niyogi, Laplacian Score for Feature Selection, Adv. Neural Inf. Process. Syst. 18 (2005), 507–514. [39] Y. Yang, Z. Ma, A.G. Hauptmann, and N. Sebe, Feature selection for multimedia analysis by sharing information among multiple tasks, Multimedia IEEE Trans. 15 (2012), no. 3, 661–669. [40] S. Yildirim, Y. Kaya, and F. Kılıc, A modified feature selection method based on metaheuristic algorithms for speech emotion recognition, Appl. Acoustics 173 (2021), 107721. [41] Y. Zhang, D. Gong, X. Gao, T. Tian, and X. Sun, Binary differential evolution with self-learning for multi-objective feature selection, Inf. Sci. 507 (2020), 67–85. [42] Z. Zhang, and E.R. Hancock, Hypergraph based information-theoretic feature selection, Pattern Recog. Lett. 33 (2012), no. 15, 1991–1999. [43] Y. Zhou, W. Zhang, J. Kang, X. Zhang, and X. Wang, A problem-specific non-dominated sorting genetic algorithm for supervised feature selection, Inf. Sci. 547 (2021), 841–859. | ||
آمار تعداد مشاهده مقاله: 739 تعداد دریافت فایل اصل مقاله: 185 |