Exploring Stemming Techniques in Ambon Malay Languages: A Systematic Literature Review

Vinnesa Patricia Carolina, Ema Utami, Ainul Yaqin

Abstract


Stemming in Ambonese posed a significant challenge due to its extensive lexicon, encompassing approximately 127,000 base words as recorded in the Kamus Besar Bahasa Indonesia (Indonesian Dictionary). This complexity arises from the task of extracting base words from those with affixes, necessitating the removal of various affixes such as prefixes, infixes, suffixes, and their combinations. This process greatly influences analytical outcomes. To address this linguistic complexity, several stemming algorithms were developed. These include Nazief & Adriani, Enhanced Confix Stripping, Sastrawi, and Tala, each offering unique techniques to handle stemming complexities in Indonesian. The selection of the appropriate algorithm is crucial for ensuring the accuracy and reliability of the stemming process within the analytical framework. In conducted stemming research, there were variations in methods used. The most frequently used algorithm was Nazief & Adriani, with 17 recorded cases, followed by Enhanced Confix Stripping with 12 cases. Sastrawi, although less frequent, was used in 4 cases, while Tala appeared in 1 case. This diversity reflects the available choices in selecting a fitting stemming method. However, this may relate to factors such as ongoing research projects, funding availability, or other external conditions affecting research production during that period. Consequently, stemming research remains an interesting and relevant topic, with the potential for continued growth and significant contributions to text processing and linguistic research in the future.

Keywords


Stemming Algorithm;Indonesia Regional Language;Ambon Malay Language;Text Processing;Systematic Literature Review

Full Text:

PDF

References


Aditya, C. S. K., & Sumadi, F. D. S. (2023). Combination of term weighting with class distribution and centroid- based approach for document classification.

Bahtiar, S. A. H., Dewa, C. K., & Luthfi, A. (2023). Comparison of Naïve Bayes and Logistic Regression in Sentiment Analysis on Marketplace Reviews Using Rating-Based Labeling. Journal of Information Systems and Informatics, 5(3), 915–927. https://doi.org/10.51519/journalisi.v5i3.539

Cahyaningrum, L., Luthfiarta, A., & Rahayu, M. (2024). Sentiment Analysis on the Impact of MBKM on Student Organizations Using Supervised Learning with Smote to Handle Data Imbalance.

Dwiharyono, H., & Suyanto, S. (2022). Stemming for Better Indonesian Text-to-Phoneme. Ampersand, 9, 100083. https://doi.org/10.1016/j.amper.2022.100083

Fahmi, S., Purnamawati, L., Shidik, G. F., Muljono, M., & Fanani, A. Z. (2020). Sentiment Analysis of Student Review in Learning Management System Based on Sastrawi Stemmer and SVM-PSO. 2020 International Seminar on Application for Technology of Information and Communication (iSemantic), 643–648. https://doi.org/10.1109/iSemantic50169.2020.9234291

Fahreza, M. D. A., Luthfiarta, A., Rafid, M., & Indrawan, M. (2024). Analisis Sentimen: Pengaruh Jam Kerja Terhadap Kesehatan Mental Generasi Z. Journal of Applied Computer Science and Technology, 5(1), 16–25. https://doi.org/10.52158/jacost.v5i1.715

Jauhari, A., Suzanti, I. O., Pramudita, Y. D., Husni, & Diantisari, N. P. W. (2020). Enhanced Confix Stripping Stemmer And Cosine Similarity For Search Engine in The Holy Qur’an Translation. 2020 6th Information Technology International Seminar (ITIS), 207–212. https://doi.org/10.1109/ITIS50118.2020.9321041

Jaya Hidayat, T. H., Ruldeviyani, Y., Aditama, A. R., Madya, G. R., Nugraha, A. W., & Adisaputra, M. W. (2022). Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier. Procedia Computer Science, 197, 660–667. https://doi.org/10.1016/j.procs.2021.12.187

Jumadi, J., Maylawati, D. S., Pratiwi, L. D., & Ramdhani, M. A. (2021). Comparison of Nazief-Adriani and Paice-Husk algorithm for Indonesian text stemming process. IOP Conference Series: Materials Science and Engineering, 1098(3), 032044. https://doi.org/10.1088/1757-899X/1098/3/032044

Karuniawati, Y., Utami, E., & Yaqin, A. (2023). A Systematic Literature Review of Stemming in Non-Formal Indonesian Language. 8(1).

Lindrawati, E., Utami, E., & Yaqin, A. (2023a). ANoM STEMMER: Nazief & Andriani Modification for Madurese Stemming. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 7(6), 1341–1347. https://doi.org/10.29207/resti.v7i6.5086

Lindrawati, E., Utami, E., & Yaqin, A. (2023b). Comparison of Modified Nazief&Adriani and Modified Enhanced Confix Stripping algorithms for Madurese Language Stemming. INTENSIF: Jurnal Ilmiah Penelitian Dan Penerapan Teknologi Sistem Informasi, 7(2), 276–289. https://doi.org/10.29407/intensif.v7i2.20103

Melia, S. I., Sholihah, J., Nisak, D., Juniaristha, I. S., & Ni’mah, A. T. (2023). The Ngoko Javanese Stemmer uses the Enhanced Confix Stripping Stemmer Method. Rekayasa, 16(1), 107–112. https://doi.org/10.21107/rekayasa.v16i1.19308

Muchtar, M. A., Jaya, I., Nababan, M., Andayani, U., Siregar, L. N., Nababan, E. B., & Sitompul, O. S. (2019). Separation of Basic Words in Angkola Batak Text Documents using Enhanced Confix Stripping Stemmer Case: Mandailing Ethnic. IOP Conference Series: Materials Science and Engineering, 648(1), 012024. https://doi.org/10.1088/1757-899X/648/1/012024

Nata, G. N. M. (2023). Pengembangan Algoritma Stemmer Bilingual Bali-Indonesia Dengan Rule-Base.

Pamungkas, N., Udayanti, E. D., Indriyono, B. V., Mahmud, W., Mintorini, E., Wahyu Dorroty, A. N., & Quamila Putri, S. (2023). Comparison of Stemming Test Results of Tala Algorithms with Nazief Adriani in Abstract Documents and National News. Inform : Jurnal Ilmiah Bidang Teknologi Informasi Dan Komunikasi, 8(1), 33–41. https://doi.org/10.25139/inform.v8i1.5569

Pesiwarissa, L. F. (2023). CIGULU-CIGULU (TEKA-TEKI) MASYARAKAT TUTUR BAHASA MELAYU AMBON (KAJIAN ETNOSEMANTIK: SUATU PENDEKATAN AWAL). Prosiding Konferensi Linguistik Tahunan Atma Jaya (KOLITA), 21(21), 208–214. https://doi.org/10.25170/kolita.21.4851

Prema Adhitya Dharma Kusumah, Kusrini Kusrini, & Kusnawi Kusnawi. (2024). Optimizing Data Security: A Literature Review on the Implementation of Beaufort Cipher for Vigenère Affine Cipher. https://doi.org/10.5281/ZENODO.10685974

Prismana, I., Prehanto, D., Dermawan, D., Herlingga, A., & Wibawa, S. (2021). Nazief & Adriani Stemming Algorithm With Cosine Similarity Method For Integrated Telegram Chatbots With Service. IOP Conference Series: Materials Science and Engineering, 1125(1), 012039. https://doi.org/10.1088/1757-899X/1125/1/012039

Purbolaksono, M. D., Reskyadita, F. D., Adiwijaya, -, Suryani, A. A., & Huda, A. F. (2020). Indonesian Text Classification using Back Propagation and Sastrawi Stemming Analysis with Information Gain for Selection Feature. International Journal on Advanced Science, Engineering and Information Technology, 10(1), 234–238. https://doi.org/10.18517/ijaseit.10.1.8858

Rianto, R., Mutiara, A. B., Wibowo, E. P., & Santosa, P. I. (2020). Improving the Accuracy of Text Classification using Stemming Method, A Case of Informal Indonesian Conversation. https://doi.org/10.21203/rs.3.rs-41431/v1

Rifai, W., & Winarko, E. (2019). Modification of Stemming Algorithm Using A Non Deterministic Approach To Indonesian Text. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 13(4), 379. https://doi.org/10.22146/ijccs.49072

Rika Rosnelly, Dedy Hartama, Muhammad Sadikin, & Cindy Paramitha Lubis. (2021). The Similarity of Essay Examination Results using Preprocessing Text Mining with Cosine Similarity and Nazief-Adriani Algorithms. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(3), 1415–1422. https://doi.org/10.17762/turcomat.v12i3.938

Rosid, M. A., Fitrani, A. S., Astutik, I. R. I., Mulloh, N. I., & Gozali, H. A. (2020). Improving Text Preprocessing For Student Complaint Document Classification Using Sastrawi. IOP Conference Series: Materials Science and Engineering, 874(1), 012017. https://doi.org/10.1088/1757-899X/874/1/012017

Simanjuntak, M. S., Panjaitan, J., & Syahputra, S. A. (2020). Using Preprocessing Text Mining With Nazief-Adriani Algorithms Similarity Of Essay Final Exam Semester. 4(36).

Sinaga, A., & Nainggolan, S. P. (2023). ANALISIS PERBANDINGAN AKURASI DAN WAKTU PROSES ALGORITMA STEMMING ARIFIN-SETIONO DAN NAZIEF-ADRIANI PADA DOKUMEN TEKS BAHASA INDONESIA. Sebatik, 27(1), 63–69. https://doi.org/10.46984/sebatik.v27i1.2072

Siswanto, B., & Dani, Y. (2021). Sentiment Analysis about Oximeter as Covid-19 Detection Tools on Twitter Using Sastrawi Library. 2021 8th International Conference on Information Technology, Computer and Electrical Engineering (ICITACEE), 161–164. https://doi.org/10.1109/ICITACEE53184.2021.9617216

Situmeang, S. I. G. (2022). Impact of Text Preprocessing on Named Entity Recognition Based on Conditional Random Field in Indonesian Text. 6(36).

Sovia, R., Defit, S., & Yuhandri. (2022). Development of the Minangkabau Local Language Translation Machine Based on Stemming. 2022 International Symposium on Information Technology and Digital Innovation (ISITDI), 195–198. https://doi.org/10.1109/ISITDI55734.2022.9944457

Soyusiawaty, D., Jones, A. H. S., & Lestariw, N. L. (2020). The Stemming Application on Affixed Javanese Words by using Nazief and Adriani Algorithm. IOP Conference Series: Materials Science and Engineering, 771(1), 012026. https://doi.org/10.1088/1757-899X/771/1/012026

Suyanto, S., Sunyoto, A., Ismail, R. N., Rachmawati, E., & Maharani, W. (2022). Stemmer and phonotactic rules to improve n-gram tagger-based indonesian phonemicization. Journal of King Saud University - Computer and Information Sciences, 34(6), 3807–3814. https://doi.org/10.1016/j.jksuci.2021.01.006

Suzanti, I. O., & Jauhari, A. (2022). COMPARISON OF STEMMING AND SIMILARITY ALGORITHMS IN INDONESIAN TRANSLATED AL-QUR’AN TEXT SEARCH. Jurnal Ilmiah Kursor, 11(2), 91. https://doi.org/10.21107/kursor.v11i2.280

Theresia Meturan, Laraswati Laraswati, & Lusi Nur Triani. (2023). Bahasa Ambon dan Bahasa Indonesia: Analisis Fonologi. Sintaksis : Publikasi Para Ahli Bahasa Dan Sastra Inggris, 1(5), 54–64. https://doi.org/10.61132/sintaksis.v1i5.261

Tjut Adek, R., Kesuma Dinata, R., & Ditha, A. (2021). Online Newspaper Clustering in Aceh using the Agglomerative Hierarchical Clustering Method. International Journal of Engineering, Science and Information Technology, 2(1), 70–75. https://doi.org/10.52088/ijesty.v2i1.206

Tuhpatussania, S., Utami, E., & Hartanto, A. D. (2022). COMPARISON OF PORTERS STEMMING ALGORITHM AND NAZIEF & ADRIANI’S STEMMING ALGORITHM IN DETERMINING INDONESIAN LANGUAGE LEARNING MODULES. Jurnal Pilar Nusa Mandiri, 18(2), 203–210. https://doi.org/10.33480/pilar.v18i2.3940

Wahyu Ade Saputra, M., Utami, E., & Yaqin, A. (2024). Unlocking Insights: A Literature Review on Enhanced Confix Stripping and Nazief & Adriani Algorithm Modifications for Makassar Language Text Stemming. International Journal of Innovative Science and Research Technology (IJISRT), 603–610. https://doi.org/10.38124/ijisrt/IJISRT24MAR437

Wardani, N. W., & Nugraha, P. G. S. C. (2020). Stemming Teks Bahasa Bali dengan Algoritma Enhanced Confix Stripping. International Journal of Natural Science and Engineering, 4(3), 103–113. https://doi.org/10.23887/ijnse.v4i3.30309

Wibowo, S. H., Toyib, R., Muntahanah, M., & Darnita, Y. (2022). Time complexity in rejang language stemming. JURNAL INFOTEL, 14(3), 174–179. https://doi.org/10.20895/infotel.v14i3.764

Yaman, A., Sartono, B., Indrawati, A., Kartika, Y. A., & Soleh, A. M. (2022). Automated Multi Label Classification on Fertilizer Themed Patent Documents in Indonesia. DESIDOC Journal of Library & Information Technology, 42(4), 218–226. https://doi.org/10.14429/djlit.42.4.17733

Yudhana, A., Fadlil, A., & Rosidin, M. (2019). Indonesian Words Error Detection System using Nazief Adriani Stemmer Algorithm. International Journal of Advanced Computer Science and Applications, 10(12). https://doi.org/10.14569/IJACSA.2019.0101231

Yunmar, R. A., Setiawan, A., & Tantriawan, H. (2020). The Combination of YAKE and Language Processing for Unsupervised Term Extraction Ontology Learning. IOP Conference Series: Earth and Environmental Science, 537(1), 012023. https://doi.org/10.1088/1755-1315/537/1/012023

Yusnitasari, T., Humaini, I., Wulandari, L., & Ikasari, D. (2019). Informatian Retrieval for Popular Words in Bahasa Translation of Al Quran and Hadith Bukhori Using Enhance Confix Stripping (ECS) Stemming. American Journal of Software Engineering and Applications, 8(1), 18. https://doi.org/10.11648/j.ajsea.20190801.13




DOI: https://doi.org/10.37905/jji.v6i1.24954

Refbacks

  • There are currently no refbacks.



JJI has been indexed by:
Sinta Crossref Scholar Garuda
Base Dimension ROAD SIS
ASCI







Editorial Office

Department of Informatics Engineering, Universitas Negeri Gorontalo
Engineering Faculty Building, 1st Floor
Jl. Prof. Dr. Ing. B. J. Habibie, Bone Bolango, Gorontalo, 96119, Indonesia.Whatsapp: +6281314270499Email: jji.ft@ung.ac.id

 

Creative Commons Licence
Jambura Journal of Informatics (JJi), is licensed under a Lisensi Creative Commons Atribusi 4.0 Internasional.