
تعداد نشریات | 21 |
تعداد شمارهها | 610 |
تعداد مقالات | 9,029 |
تعداد مشاهده مقاله | 67,082,952 |
تعداد دریافت فایل اصل مقاله | 7,656,405 |
Using alignment-free methods as preprocessing stage to classification whole genomes | ||
International Journal of Nonlinear Analysis and Applications | ||
دوره 12، شماره 2، بهمن 2021، صفحه 1531-1539 اصل مقاله (885.43 K) | ||
نوع مقاله: Research Paper | ||
شناسه دیجیتال (DOI): 10.22075/ijnaa.2021.5281 | ||
نویسندگان | ||
Najah Abed Alhadi Shanan* 1؛ Hussein Attya Lafta1؛ Sura Z. Alrashid2 | ||
1Computer Department, Science College for Women, University of Babylon, Babylon, Iraq | ||
2College of Information Technology, University of Babylon, Babylon, Iraq | ||
تاریخ دریافت: 30 اسفند 1399، تاریخ بازنگری: 25 فروردین 1400، تاریخ پذیرش: 28 اردیبهشت 1400 | ||
چکیده | ||
In bioinformatics systems, the study of genetics is a popular research discipline. These systems depend on the amount of similarity between the biological data. These data are based on DNA sequences or raw sequencing reads. In the preprocessing stage, there are several methods for measuring similarity between sequences. The most popular of these methods is the alignment method and alignment-free method, which are applied to determine the amount of functional matching between sequences of nucleotides DNA, ribosome RNA, or proteins. Alignment-based methods pose a great challenge in terms of computational complexity, In addition to delaying the time to search for a match, especially if the data is heterogeneous and its size is huge, and thus the classification accuracy decreases in the post-processing stage. Alignment-free methods have overcome the challenges of alignment-based methods for measuring the distance between sequences, The size of the data used is 1000 genomes uploaded from National Center for Biotechnology Information (NCBI), after eliminating the missing and irrelevant values, it becomes 860 genomes, ready to be segmented into words by the k-mer analysis, after which the frequency of each word is counted for each query. The size of a word depends on a value of k. In this paper we used a value of k =3 ….8, for each iteration will count times of frequencies words. | ||
کلیدواژهها | ||
16S RNA؛ DNA؛ k-mers | ||
مراجع | ||
[1] N. Abed, A. Shanan, H. A. Lafta and S. Z. Al Rashid, Bacteria taxonomic classification using machine-learning models, Solid State Tech. 64 (2021) 1091–1112. [2] S. Aggarwal, Using Mutual Information for extracting Biclusters from Gene Expression Data, New Delhi, 2013. [3] A.K. Al-Mashanji and S.Z. Al-Rashi, Computational Methods for Preprocessing and Classifying Gene Expression Data- Survey, 4th Sci. Int. Conf. Najaf, SICN 2019, March (2020) 121–126. [4] S.Z. Al-Rashid and N.H. Al-Aaraji, Bayesian Models with Coregionalization to Model Gene Expression Time Series for Mouse Model for Speed Progression of ALS Disease, Eur. J. Sci. Res. 1 (2015) 1–20.[5] J. R. Cole et al., The Ribosomal database project: Improved alignments and new tools for rRNA analysis, Nucleic Acids Res. 1 (2009) 141-–145. [6] M. El Kourdi, A. Bensaid and T. Rachidi, Automatic Arabic document categorization based on the Na¨ıve Bayes algorithm, Proc. Workshop on Comput. Approaches to Arabic Script-based Languages, (2004) 51–58. [7] K. Eschke, J. Trimpert, N. Osterrieder and D. Kunec, Attenuation of a very virulent Marek’s disease herpesvirus (MDV) by codon pair bias deoptimization, PLoS Pathog. 14 (2018) 1-–24. [8] A. Fiannaca et al., Deep learning models for bacteria taxonomic classification of metagenomic data, BMC Bioinf. 19 (2018) 61–76. [9] G. Gamage, N. Gimhana, A. Wickramarachchi, V. Mallawaarachchi and I. Perera, Alignment-free whole genome comparison using k-mer forests, 19th Int. Conf. Adv. ICT Emerg. Reg. ICTer 2019 - Proc. 2019. [10] L. Y. Geer, N. Gimhana, A. Wickramarachchi, V. Mallawaarachchi, and I. Perera, The NCBI BioSystems database, Nucleic Acids Res. 38 (2009) 492-–496. [11] C. Gustafsson, S. Govindarajan, J. Minshull, and M. Park, Codon bias and heterologous protein expression. [Trends Biotechnol. 2004]- PubMed result, Trends Biotechnol., 2004. [12] S. J. Kho, M. L. Raymer, H. B. Yalamanchili, and A. P. Sheth, A novel approach for classifying gene expression data using topic modeling, ACM-BCB 2017 - Proc. 8th ACM Int. Conf. Bioinformatics, Comput. Biol. Heal. Inf. (2017) 388—393. [13] J. M. Kirk et al., Functional classification of long non-coding RNAs by k-mer content, Nat. Genet. 10 (2018) 1474—1482. [14] M. La Rosa, A. Fiannaca, R. Rizzo and A. Urso, Probabilistic topic modeling for the analysis and classification of genomic sequences, BMC Bioinformatics, 6 (2015) 1-–9. [15] P.A. Mundra and J.C. Rajapakse, Gene and sample selection using T-score with sample selection, J. Biomed. Inf. 59 (2016) 31—41. [16] A. Nair, Computational biology & bioinformatics: a gentle overview, Commun. Comput. Soc. India 5 (2007) 1—13. [17] S.C. Perry and R.G. Beiko, Distinguishing microbial genome fragments based on their composition: Evolutionary and comparative genomic perspectives, Genome Biol. Evol. 2 (2010) 117—131. [18] V.O. Polyanovsky, M.A. Roytberg and V.G. Tumanyan, Comparative analysis of the quality of a global algorithm and a local algorithm for alignment of two sequences, Algorithms Mol. Biol. 6 (2011) 1—12. [19] S. Ram´ırez-Gallego, B. Krawczyk, S. Garc´ıa, M. Wo´zniak and F. Herrera, A survey on data preprocessing for data stream mining: Current status and future directions, Neurocom. 239 (2017) 39-–57. [20] A. Sievers, F. Wenz, M. Hausmann and G. Hildenbrand, Conservation of k-mer composition and correlation contribution between introns and intergenic regions of animalia genomes, Genes. (Basel) 9 (2018) 1—19. [21] K. Simek et al., Using SVD and SVM methods for selection, classification, clustering and modeling of DNA microarray data, Eng. Appl. Artif. Intell., 4 (2004) 417—427. [22] G.Z. Valenci, M. Rubinstein, R. Afriat, Z.D. Shira Rosencwaig, E. Rorman and I. Nissan, Draft Genome Sequences of Cronobacter muytjensii Cr150 , Cronobacter turicensis Cr170, and Cronobacter sakazakii Cr611 Gal, Microbiology Resource Announ. 9(44) (2020) 9—11. [23] S. Vinga and J. Almeida, Alignment-free sequence comparison - A review, Bioinf. 4 (2003) 513-–523. [24] M. Welch et al., Design parameters to control synthetic gene expression in Eschorichia coli, PLoS One, 9 (2009). [25] R. Yin, Z. Luo and C. K. Kwoh, Alignment-free machine learning approaches for the lethality prediction of potential novel human-adapted coronavirus using genomic nucleotide, bioRxiv, (2020) 1–18. | ||
آمار تعداد مشاهده مقاله: 15,789 تعداد دریافت فایل اصل مقاله: 492 |