Optimasi K-Means++ Menggunakan Principal Component Analysis (PCA) pada Klasterisasi Profil Kelulusan Mahasiswa

Herdiesel Santoso; Hana Solikatun

doi:10.61805/fahma.v24i2.202

Authors

Herdiesel Santoso Program Studi Sistem Informasi STMIK El Rahma Author
Hana Solikatun Program Studi Sistem Informasi STMIK El Rahma Author

DOI:

https://doi.org/10.61805/fahma.v24i2.202

Keywords:

Clustering, Data Akademik, Data Mining, K-Means++, Principal Component Analysis

Abstract

Timely graduation is a key indicator of student success and institutional effectiveness in higher education. However, clustering student academic records containing mixed data types (numerical and categorical) using the conventional K-Means algorithm often leads to distance bias and reduced clustering quality due to the curse of dimensionality. This study proposes an optimized K-Means++ approach integrated with One-Hot Encoding and Principal Component Analysis (PCA) to improve clustering performance. The model was evaluated using 200 graduate records from STMIK El Rahma Yogyakarta. The results show that reducing the dataset to two principal components significantly enhances cluster quality. Validation metrics indicate that the Silhouette Score increased from 0.3275 to 0.4979, the Davies–Bouldin Index decreased from 1.405 to 0.871, and the Calinski–Harabasz Index improved from 70.448 to 168.035. The optimized model identified two distinct groups: Academically Stable Students (157 students) and At-Risk Working Students (43 students), the latter predominantly consisting of part-time employed students. These findings provide valuable insights for developing data-driven Academic Early Warning Systems (EWS) that enable higher education institutions to identify students at risk of delayed graduation and implement targeted intervention strategies.

Downloads

Download data is not yet available.

References

R. Swastyani and H. Santoso, “Perbandingan Algoritma Klasifikasi K-NN dengan Variasi Jarak, Naive Bayes, Logistic Regression, dan Decision Tree Untuk Prediksi Kelulusan Mahasiswa,” JATI (Jurnal Mhs. Tek. Inform., vol. 9, no. 4, pp. 7057–7064, 2025, doi: https://doi.org/10.36040/jati.v9i4.14255.

S. Junaidi, R. V. Anggela, and D. Kariman, “Klasifikasi Metode Data Mining untuk Prediksi Kelulusan Tepat Waktu Mahasiswa dengan Algoritma Naïve Bayes , Random Forest , Support Vector Machine ( SVM ) dan Artificial Neural Nerwork ( ANN ),” J. Appl. Comput. Sci. Technol. ( JACOST ), vol. 5, no. 1, pp. 109–119, 2024, doi: https://doi.org/10.52158/jacost.v5i1.489.

M. H. Sytar and Ermatita, “Prediksi Kelulusan Mahasiswa Tepat Waktu Dengan Metode Random Forest Berdasarkan Klasifikasi Algoritma K-Means,” J. Pendidik. Mat. Judika Educ., vol. 8, no. 3, pp. 391–410, 2025, doi: https://doi.org/10.52436/1.jpti.577.

U. Subagyo, A. B. Thoha, A. Syafrianto, H. Santoso, Siswaya, and R. Sanuri, “Customer Behavior Profiling in Wholesale Retail Using RFM Analysis and K-Means Clustering,” 2025 4th Int. Conf. Electron. Represent. Algorithm, pp. 467–472, 2025, doi: https://doi.org/10.1109/ICERA66156.2025.11087354.

T. H. Mardzuki, R. Lubis, and F. F. Adiwijaya, “Penerapan Algoritma K-Means Clustering pada Sistem Prediksi Kelulusan Tepat Waktu,” Komputika J. Sist. Komput., vol. 13, no. 2, pp. 289–299, 2024, doi: https://doi.org/10.34010/komputika.v13i2.14097.

D. Hosanna, N. Setiyawati, and H. D. Purnomo, “Comparison between K-Means and K-Means ++ Clustering Models Using Singular Value Decomposition ( SVD ) in Menu Engineering,” Int. J. INFORMATICS Vis. Int. J., vol. 7, no. 3, 2023.

R. P. Nugraha, G. F. Laxmi, and F. Riana, “Penerapan K-Means ++ untuk Pengelompokan Mahasiswa Berpotensi Drop Out ( Studi Kasus : Universitas Ibn Khaldun Bogor ),” JATI (Jurnal Mhs. Tek. Inform., vol. 8, no. 3, pp. 3493–3500, 2024, doi: https://doi.org/10.36040/jati.v8i3.9738.

R. Rianti, R. Andarsyah, and R. M. Awangga, “Penerapan PCA dan Algoritma Clustering untuk Analisis Mutu Perguruan Tinggi di LLDIKTI Wilayah IV,” NUANSA Inform., vol. 18, no. 2, 2024, doi: https://doi.org/10.25134/ilkom.v18i2.211.

I. M. Nur and Abdurakhman, “Application of the K-Means ++ Method for Grouping Health Services Based on Districts in West Java Province,” EKSAKTA J. Sci. Data Anal., vol. 5, no. 1, pp. 96–102, 2024, doi: https://doi.org/10.20885/EKSAKTA.vol5.iss1.art11.

K. Sa’diyah, K. D. Primadhieta, and H. Al Rosyid, “Clustering Wilayah Pulau Jawa Berdasarkan Indikator Sosial Ekonomi Menggunakan Metode K-Means,” JATI (Jurnal Mhs. Tek. Inform., vol. 10, no. 1, pp. 1032–1036, 2026, doi: https://doi.org/10.36040/jati.v10i1.16934.

M. D. Salman, N. R. Pratama, and M. N. F. A, “Comparison of K-Means and K-Medoids Clustering Algorithm Performance in Grouping Schools in Riau Province Based on Availability of Facilities and Infrastructure.,” Inst. Ris. dan Publ. Indones., vol. 5, no. July, pp. 797–806, 2025, doi: https://doi.org/10.57152/malcom.v5i3.1950.

A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-Means Clustering Algorithms: A Comprehensive Review, Variants Analysis, And Advances In The Era Of Big Data,” Inf. Sci. (Ny)., vol. 622, pp. 178–210, 2023, doi: https://doi.org/10.1016/j.ins.2022.11.139.

S. Butsianto and A. Siswandi, “Implementasi K-Means Clustering Berbasis RapidMiner untuk Optimalisasi Segmentasi Penjualan Produk dalam Meningkatkan Efektivitas Strategi Pemasaran,” J. Inf. Syst. Res., vol. 7, no. 1, pp. 200–210, 2025, doi: https://doi.org/10.47065/josh.v7i1.8439.

F. D. Agustiar, B. N. Sari, and I. Maulana, “Penerapan Data Mining Untuk Pengelompokan Produk Penjualan Menggunakan Algoritma K-Means (Studi Kasus : Toko Agung Makmur Jaya),” JATI (Jurnal Mhs. Tek. Inform., vol. 9, no. 1, pp. 59–67, 2025, doi: https://doi.org/10.36040/jati.v9i1.12178.

N. Nugroho and F. D. Adhinata, “Penggunaan Metode K-Means dan K-Means++ Sebagai Clustering Data Covid-19 di Pulau Jawa,” Teknika, vol. 11, no. 3, pp. 170-179., 2022, doi: https://doi.org/10.34148/teknika.v11i3.502.

N. A. Maori, “Metode Elbow Dalam Optimasi Jumlah Cluster Pada K-Means Clustering,” J. SIMETRIS, vol. 14, no. 2, pp. 277–287, 2023, doi: https://doi.org/10.24176/simet.v14i2.9630.