Penggunaan N-mers Frequency pada Analisis Barisan DNA

Khoirul Umam, Rahmat Sagara

Abstract


Salah satu metode untuk menganalisis barisan DNA adalah menggunaan N-mers Frequency. N-mers Frequency termasuk metode data mining pada barisan DNA, dimana barisan DNA yang merupakan data string “ACGT” akan diubah menjadi data numerik. N-mers Frequency pada tulisan ini menggunakan N = 3. Hal ini disebabkan karena pada proses sintesis protein, tRNA akan membawa tiga basa nekleotida (anti kodon) yang akan dipasangkan dengan tiga basa nekleotida (kodon) pada pita mRNA. Dalam hal ini mRNA dibentuk dari duplikasi barisan DNA. Studi ini dilakukan untuk mengetahui akurasi dari penggunaan N-mers Frequency. Untuk menghitung Akurasi penggunaan N-mers Frequency, dilakukan tahapan seperti berikut: (1) pengumpulan data barisan DNA, (2) N-mers Frequency, (3) matriks jarak, (4) pengelompokan menggunakan algoritma K-means++, PAM, AGNES, dan DIANA, (5) menghitung akurasi, dan (6) kesimpulan. Akurasi dari Penggunaan N-mers Frequency pada penelitian ini adalah 100%, dengan menggunakan data 100 barisan DNA yang telah diketahui jenisnya, yaitu: virus HPV, virus Ebola, virus Marburg, dan virus Zika.

Keywords


N-mers Frequency; K-means++; Data Mining; Barisan DNA

Full Text:

PDF

References


A. Lucassen, J. Montgomery, and M. Parker, “Ethics and the Social Contract for Genomics in the NHS,” 2017.

NCBI, “National Center for Biotechnology Information,” U.S. National Library of Medicine. [Online]. Available: https://www.ncbi.nlm.nih.gov. [Accessed: 15-Feb-2018].

Xinhua, “China to Create Gigantic DNA Database,” 2017. [Online]. Available: http://www.chinadaily.com.cn/china/2017-10/31/content_33930020.htm. [Accessed: 22-Oct-2018].

B. Chor, D. Horn, N. Goldman, Y. Levy, and T. Massingham, “Genomic DNA k-mer spectra: models and modalities,” Genome Biol., vol. 10, no. 10, p. R108, 2009.

A. Bustamam, I. Fitria, and K. Umam, “Application of Agglomerative Clustering for Analyzing Phylogenetically on Bacterium of Saliva,” in AIP Conference Proceedings, 2017, p. 030126.

S. M. Gollin, “Epidemiology of HPV-Associated Oropharyngeal Squamous Cell Carcinoma,” in Human Papillomavirus (HPV)-Associated Oropharyngeal Cancer, D. L. Miller and M. S. Stack, Eds. Cham: Springer International Publishing, 2015, pp. 1–23.

E. Mühlberger, “Genome Organization, Replication, and Transcription of Filoviruses,” in Ebola and Marburg Viruses: Molecular and Cellular Biology, H.-D. Klenk and H. Feldmann, Eds. Winnipeg: Horizon Bioscience, 2004.

S. R. da Silva, F. Cheng, and S.-J. Gao, Zika Virus and Diseases. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2018.

NCBI, “Nucleotide - National Center for Biotechnology Information,” U.S. National Library of Medicine. [Online]. Available: https://www.ncbi.nlm.nih.gov/nuccore. [Accessed: 15-Feb-2018].

D. W. Mount, Bioinformatics: Sequence and Genome Analysis, 2nd ed. Tucson: Cold Spring Harbor Laboratory Press, 2004.

K. Umam, A. Bustamam, and D. Lestari, “Application of hybrid clustering using parallel k-means algorithm and DIANA algorithm,” in AIP Conference Proceedings, 2017, p. 020024.

L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. Hoboken, NJ, USA: John Wiley & Sons, Inc., 1990.

C. C. Aggarwal and C. K. Reddy, Data Clustering: Algorithms and Applications, 1st ed. CRC Press, 2013.

D. Arthur and S. Vassilvitskii, “K-Means++: The Advantages of Careful Seeding,” in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 2007.

R. D. Cahyaningrum, A. Bustamam, and T. Siswantining, “Implementation of spectral clustering with partitioning around medoids (PAM) algorithm on microarray data of carcinoma,” in AIP Conference Proceedings, 2017, p. 020007.

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Waltham: Morgan Kaufmann Publishers, 2012.




DOI: https://doi.org/10.34312/jjom.v2i2.4320



Copyright (c) 2020 K. Umam, R. Sagara

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Jambura Journal of Mathematics has been indexed by

>>>More Indexing<<<


Creative Commons License

Jambura Journal of Mathematics (e-ISSN: 2656-1344) by Department of Mathematics Universitas Negeri Gorontalo is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Powered by Public Knowledge Project OJS. 


Editorial Office


Department of Mathematics, Faculty of Mathematics and Natural Science, Universitas Negeri Gorontalo
Jl. Prof. Dr. Ing. B. J. Habibie, Moutong, Tilongkabila, Kabupaten Bone Bolango, Gorontalo, Indonesia
Email: info.jjom@ung.ac.id.