Diagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets

Document Type : Research Article

Authors

Faculty of Mathematics and Computer, Department of Computer Science, Shahid Bahonar University of Kerman, Kerman, Iran

Abstract

With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene selection, the dataset is reduced and detection of differentially abundant genes facilitated with more accuracy. This will, in turn, increases the power of genes which are correctly detected statistically differentially abundant in two or more phenotypes. The method presented in this study is a two-stage method for functional analysis of metagenomes.  The first stage uses a combination of the filter and wrapper gene selection method, which includes the ant colony algorithm and utilizes fuzzy rough sets to calculate the information gain ratio as an evaluation measure in the ant colony algorithm. The set of features from the first stage is used as input in the second stage, and then the negative binomial distribution is used to detect genes which are statistically differentially abundant in two or more phenotypes. Applying the proposed method on a microarray dataset it becomes clear that the proposed method increases the accuracy of the classifier and selects a subset of genes that have a minimum length and maximum accuracy.

Highlights

  • Gene selection as a preprocessing phase is very important in the diagnosis of diseases.
  • By applying a two-stage gene selection method, the accuracy of detecting diseases process was increased. 
  • By detecting the genes which were statistically differentially abundant in different phenotypes, the genes that related to healthy or diseases were detected.

Keywords


[1] V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, J.M. Benítez, F. Herrera, A review of microarray datasets and applied feature selection methods, Inform. Sciences, 282 (2014) 111-135. 
[2] H. Salem, G. Attiya, N. El-Fishawy, Classification of human cancer diseases by gene expression profiles, Appl. Soft Comput. 50 (2017) 124-134.
[3] P. Agarwalla, S. Mukhopadhyay, Bi-stage hierarchical selection of pathway genes for cancer progression using a swarm based computational approach, Appl. Soft Comput. 62 (2018) 230-250.
[4] H.H. Inbarani, A.T. Azar, G. Gothi, Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis, Comput. Met. Prog. Bio. 113 (2014) 175-185.
[5] Y.Chen, Q.Zhu, H. Xu, Finding rough set reducts with fish swarm algorithm, Knowl-Based Syst. 81 (2015) 22-29.
[6] I.K. Park, G.S. Choi, Rough set approach for clustering categorical data using information-theoretic dependency measure, Inform. Syst. 48 (2015) 289-295. 
[7] Z. Pawlak, A. Skowron, Rudiments of rough sets, Inform. Sciences, 177 (2017) 3-27.
[8] L.I. Kuncheva, Fuzzy rough sets: Application to feature selection, Fuzzy Set. Syst. 51 (1992) 147-153.
[9] R. Jensen, Q. Shen, Fuzzy-rough attributes reduction with application to web categorization, Fuzzy Set. Syst. 141 (2004) 469-485.
[10] M. Pradipta, G. Partha, Fuzzy-rough simultaneous attribute selection and feature extraction algorithm, IEEE T. Cybernetics, 43 (2013) 1166-1177.
[11] S. Zhao, E.C.C. Tsang, D. Chen, X. Wang, Building a rule-based classifier-a fuzzy rough set approach, IEEE T. Knowl. Data En. 22 (2010) 624-638.
[12] M. Dorigo, LM. Gambardella, A cooperative learning approach to the traveling salesman problem, IEEE T. Evolut. Comput. 1 (1997) 53-66.
[13] P. Schloss, J. Handelsman, Introducing SONS, a tool for operational taxonomic unit based comparisons of microbial community memberships and structures, Appl. Environ. Microb. 72 (2006) 6773-6779.
[14] B. Rodriguez-Brito, F. Rohwer, R.A. Edwards, An application of statistics to comparative metagenomics, BMC Bioinformatics, 7 (2006) 162.
[15] J. White, N. Nagarajan, M. Pop, Statistical methods for detecting differentially abundant features in clinical metagenomics samples, PLOS Comput. Biol, 5 (2009) e1000352.
[16] D. Huson, D. Richter, S. Mitra, A. Auch, S. Schuster, Methods for comparative metagenomics, BMC Bioinformatics, 10(Suppl 1) (2009) S12.
[17] Kristiansson, E. et al, ShotgunFunctionalizeR: An R-package for functional comparison of metagenomes, Bioinformatics, 25 (2009) 2737-2737.
[18] G.A. Montazer, S. ArabYarmohammadi, Detection of phishing attacks in Iranian e-banking using a fuzzy-rough hybrid system, Appl. Soft Comput. 35 (2015) 482-492.
[19] M. Podsiadło, H. Rybiński, Rough sets in economy and finance, In: Peters J.F., Skowron A. (eds) Transactions on Rough Sets XVII. Lecture Notes in Computer Science, Vol. 8375, pp. 109-173, 2014.
[20] C.H. Xie, Y.J. Liu, J.Y. Chang, Medical image segmentation using rough set and local polynomial regression, Multimed. Tools Appl. 74 (2015) 1885-1914.
[21] V. Prasad, T.S. Rao, M.S. Babu, Thyroid disease diagnosis via hybrid architecture composing rough data sets theory and machine learning algorithms, Soft Comput. 20 (2016) 1179-1189.
[22] M.P. Francisco, J.V. Berna-Martinez, A.F. Oliva, M.A.A. Ortega, Algorithm for the detection of outliers based on the theory of rough sets, Decis. Support Syst. 75 (2015) 63-75.
[23] J. Dai, Q. Xu, Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification, Appl. Soft Comput. 13 (2013) 211-221.
[24] M. Dorigo, L.M. Gambardella, A cooperative learning approach to the traveling salesman problem, IEEE T. Evolut. Comput. 1 (1997) 53-66.
[25] P. Naruekamol, M. Sohn, Q. Li, A two-stage statistical procedure for feature selection and comparison in functional analysis of metagenomes, Bioinformatics, 31 (2014) 157-165.