Analisis Kinerja Model CNN-LSTM Berbasis Optical Character Recognition untuk Ekstraksi Informasi e-KTP Berdasarkan Kategori Teks

Fauzan Ihza Fajar, Selly Anastassia Amellia Kharis, Asmara Iriani Tarigan

Abstract


Optical Character Recognition (OCR) technology plays an important role in automating information extraction from identity documents such as the Electronic Identity Card (e-KTP). However, recognizing long text sequences and handling complex character variations remain significant challenges. These issues can lead to high error rates. This study aims to address these limitations by exploring a deep learning–based OCR model that integrates Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Connectionist Temporal Classification (CTC) in an end-to-end framework without explicit character segmentation. CNN is employed to extract visual features, LSTM captures sequential dependencies, and CTC enables flexible alignment between input images and output text. The main contribution of this study lies in analysing the performance of a CNN-LSTM model with CTC in extracting e-KTP information across text categories with different complexity levels, namely Date and Place of Birth (TTL), name, and national identification number (NIK). Performance is evaluated using the Character Error Rate (CER). The results show that the model achieves the best performance on TTL with a CER of 0.84%, followed by NIK at 1.29%, and Name at 4.33% indicating higher difficulty in recognizing more complex text patterns. These findings demonstrate that model performance is influenced by text characteristics, particularly variability and sequence length. Overall, the proposed approach is effective for end-to-end e-KTP information extraction and provides insights for developing more adaptive OCR models.

Keywords


Connectionist Temporal Classification; Convolutional Neural Network; e-KTP; Long Short-Term Memory; Optical Character Recognition

Full Text:

PDF

References


O. Rahmdani, “Evaluasi Kinerja Tesseract-OCR dalam Pengenalan Teks Tulisan Tangan Menggunakan Dataset Kustom,” Jurnal Informatika dan Teknik Elektro Terapan, vol. 13, no. 3, Jul. 2025, doi: 10.23960/jitet.v13i3.7162.

M. Riyandi Fauzi, N. D. Agus, and A. Z. Ajulian, “Mengubah Tulisan Tangan Menjadi Text Digital OCR (Optical Character Recognition) dengan Menggunakan Metode Segmentasi dan Korelasi,” Transient: Jurnal Ilmiah Teknik Elektro, vol. 2, no. 4, pp. 1013–1017, 2014, doi: https://doi.org/10.14710/transient.v2i4.1013-1017.

Y. Darmi, M. F. Sepriansyah, and Y. Darnita, “Penerapan Metode Optical Character Recognition (OCR) untuk Mengidentifikasi Teks pada Identitas Dokumen Surat Izin Mengemudi (SIM),” Jurnal Mahasiswa Teknik Informatika), vol. 9, no. 4, 2025, doi: https://doi.org/10.36040/jati.v9i4.13987.

M. Tampang, I. Sartika, and F. Ruhana, “Kualitas Pelayanan Publik dalam Pembuatan Kartu Tanda Penduduk Elektronik (E-KTP) di Suku Dinas Kependudukan dan Catatan Sipil Kota Jakarta Selatan,” Jurnal Kajian Pemerintah (JKP), vol. 10, pp. 73–85, 2024, doi: https://doi.org/10.25299/jkp.2024.vol10(1).16958.

B. Shi, X. Bai, and C. Yao, “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 11, Jul. 2015, doi: 10.1109/TPAMI.2016.2646371.

B. Baso, Risald, and N. Huda, “Hybrid Algoritma Convolutional Neural Network dengan Support Vector Machine untuk Klasifikasi Jenis Tenun Timor Hybrid,” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), vol. 12, no. 6, pp. 1233–1242, 2025, doi: https://doi.org/10.25126/jtiik.2025126.

N. Chakraborty, S. Kundu, S. Paul, A. F. Mollah, S. Basu, and R. Sarkar, “Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach,” J. Ambient Intell. Humaniz. Comput., vol. 12, no. 7, pp. 7997–8008, Jul. 2021, doi: 10.1007/s12652-020-02528-4.

W. Lu, J. Li, Y. Li, A. Sun, and J. Wang, “A CNN-LSTM-based model to forecast stock prices,” Complexity, vol. 2020, 2020, doi: 10.1155/2020/6622927.

M. Hafizh Fattah, M. Alfan Rosid, S. Aji, P. Studi Informatika, and F. Sains dan Teknologi, “Hybrid CNN-LSTM for Indonesian Cyberbullying Detection on Social Media X,” JITE (Journal of Informatics and Telecommunication Engineering), vol. 9, no. 2, pp. 548–563, 2026, doi: 10.31289/jite.v9i2.16938.

B. Rizki Hanafi, P. Stiaji, and W. Agus Triyanto, “Implementasi Image Processing dalam Pemindaian Data KTP menggunakan Optical Character Recognition (OCR),” Sistemasi: Jurnal Sistem Informasi, vol. 15, 2026, doi: https://doi.org/10.32520/stmsi.v15i1.5856.

N. Syafrie Rahardian, E. Maryanto, and D. A. Nawangnugraeni, “An Integrated Pipeline with Hierarchical Segmentation and CNN for Automated KTP-el Data Extraction on the e-Magang Platform,” Jurnal Teknik Informatika (Jutif), vol. 6, no. 5, pp. 3093–3110, Oct. 2025, doi: 10.52436/1.jutif.2025.6.5.5279.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Conference on Computer Vision and Pattern Recognition, 2016. doi: 10.1109/CVPR.2016.90.

M. Geetha, R. C. Suganthe, S. K. Nivetha, S. Hariprasath, S. Gowtham, and C. S. Deepak, “A Hybrid Deep Learning Based Character Identification Model Using CNN, LSTM, And CTC To Recognize Handwritten English Characters And Numerals,” 2022 International Conference on Computer Communication and Informatics, ICCCI 2022, 2022, doi: 10.1109/ICCCI54379.2022.9740746.

R. Ramadhan Harahap, B. Fachri, and R. Prayudi, “Pemanfaatan Teknologi OCR (Optical Character Recognition) dalam Pembuatan Aplikasi Kalkulator Tulisan Tangan Sederhana,” Journal of Science and Social Research, no. 2, pp. 272–278, 2022, doi: http://dx.doi.org/10.54314/jssr.v5i2.916.

K. Banu, D. Andreas, W. Anggoro, and A. Setiawan, “OCR: Masa Depan Pengenalan Karakter Optik dan Dampaknya pada Kehidupan Modern,” Jurnal Teknologi Informasi, vol. 9, 2023, doi: https://doi.org/10.52643/jti.v9i2.3798.

Lady Angel, D. Kaura Amelia, N. Nur Syavina, S. Syahirah, M. Fasha Akbar Aulady, and A. Puteri Amelia, “Pemanfaatan Artificial Intelligence (AI) Untuk Meningkatkan Efisiensi dan Akurasi Data pada Sistem Informasi di Lembaga Kearsipan,” 2025. doi: https://doi.org/10.56799/peshum.v5i1.13751.

A. Tri Putra Darti Akhsa, M. Ikhwan Burhan, and A. Munandar, “Integrasi OCR dan TF-IDF untuk Metadata Otomatis pada Pencarian Dokumen Digital,” Jurnal Fasilkom, vol. 15, pp. 304–311, 2025, doi: https://doi.org/10.37859/jf.v15i2.9918.

S. Kumari, A. Akole, P. Angnani, Y. Bhamare, and Z. Naikwadi, “Enhanced braille display use of OCR and solenoid to improve text to braille conversion,” 2020 International Conference for Emerging Technology, INCET 2020, Jun. 2020, doi: 10.1109/INCET49848.2020.9153996.

F. Gesang Panuntun and R. Hajar Puji Sejati, “Sistem Otomatisasi Deteksi dan Ekstraksi Data KTP Berbasis Convolutional Neural Network dan Optical Character Recognition 1,” JSAI: Journal Scientific and Applied Informatics, vol. 7, no. 3, 2024, doi: https://doi.org/10.36085/jsai.v7i3.7269.

I. Wijaya, C. Lubis, and K. Kunci, “Pengimplementasian OCR menggunakan CNN untuk Ekstraksi Teks pada Gambar,” Jurnal Ilmu Komputer dan Sistem Informasi, pp. 1–6, doi: https://doi.org/10.24912/jiksi.v10i1.17836.

L. Krismona, A. Ashari, A. Setiawan, and R. Rosnelly, “Deteksi Ketidakkonsistenan Font Sebagai Indikator Pemalsuan dan Penyuntingan Dokumen Digital Menggunakan Convolutional Neural Network (CNN),” Volume, vol. 8, no. 2, p. 214, 2025, doi: https://doi.org/10.53513/jsk.v8i2.11741.

I. Dwijayanti et al., “Ekstraksi Aspek Aksesibilitas untuk Peningkatan Pengalaman Pengguna Menggunakan NER dengan CNN dan LSTM,” Jurnal Fasilkom, vol. 14, no. 3, pp. 556–563, 2024, doi: https://doi.org/10.37859/jf.v14i3.8032.

M. T. Panjalu and L. M. Wisudawati, “Pembacaan Gerak Bibir Menggunakan CNN, Bi-LSTM Dan CTC Loss Function Pada Dataset Bahasa Inggris,” Jurnal Ilmiah Komputasi, vol. 24, no. 1, pp. 51–60, Mar. 2025, doi: 10.32409/jikstik.24.1.3658.

A. Yadav, S. Singh, M. Siddique, N. Mehta, and A. Kotangale, “OCR using CRNN: A Deep Learning Approach for Text Recognition,” 2023 4th International Conference for Emerging Technology, INCET 2023, 2023, doi: 10.1109/INCET57972.2023.10170436.

A. Sherstinsky, “Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network,” Physica D, vol. 404, p. 132306, Mar. 2020, doi: 10.1016/J.PHYSD.2019.132306.

I. Campiotti and R. Lotufo, “Optical character recognition with transformers and CTC,” in DocEng 2022 - Proceedings of the 2022 ACM Symposium on Document Engineering, Association for Computing Machinery, Inc, Sep. 2022. doi: 10.1145/3558100.3563845.




DOI: https://doi.org/10.37905/euler.v14i1.37593

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Fauzan Ihza Fajar, Selly Anastassia Amellia Kharis, Asmara Iriani Tarigan

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Euler : Jurnal Ilmiah Matematika, Sains dan Teknologi has been indexed by:


 EDITORIAL OFFICE OF EULER : JURNAL ILMIAH MATEMATIKA, SAINS, DAN TEKNOLOGI

 Department of Mathematics, Faculty of Mathematics and Natural Science, Universitas Negeri Gorontalo
Jl. Prof. Dr. Ing. B. J. Habibie, Tilongkabila, Kabupaten Bone Bolango 96554, Gorontalo, Indonesia
 Email: [email protected]
 +6287777-586462 (WhatsApp Only)
 Euler : Jurnal Ilmiah Matematika, Sains dan Teknologi (p-ISSN: 2087-9393 | e-ISSN:2776-3706) by Department of Mathematics Universitas Negeri Gorontalo is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.  Powered by Public Knowledge Project OJS.