Perbandingan Logistic Regression dan Random Forest untuk Prediksi Respon Pelanggan Asuransi

Harliana Harliana; Tito Prabowo; Ady Alzhava  Nuary

doi:10.61805/fahma.v24i2.214

Authors

Harliana Harliana Universitas Nahdlatul Ulama Blitar Author
Tito Prabowo Universitas Nahdlatul Ulama Blitar Author
Ady Alzhava Nuary Author

DOI:

https://doi.org/10.61805/fahma.v24i2.214

Keywords:

Prediksi respon konsumen, Logistic Regression, Random Forest, Machine learning, Data-driven marketing

Abstract

Vehicle insurance companies increasingly rely on data-driven marketing strategies to identify prospective customers who are likely to respond positively to insurance offers. However, customer response prediction is challenging due to class imbalance, where non-responsive customers substantially outnumber responsive ones. This study aims to compare the performance of Logistic Regression and Random Forest models in predicting customer responses to vehicle insurance products using the Synthetic Minority Oversampling Technique (SMOTE). The analysis was conducted using the Vehicle Insurance dataset obtained from Kaggle. Experimental results indicate that Random Forest achieved the best overall performance, with an accuracy of 0.80, a positive-class F1-score of 0.59, and a ROC–AUC score of 0.88. In contrast, Logistic Regression produced a higher positive-class recall of 0.98 but a lower precision of 0.35, indicating a greater tendency to generate false-positive predictions. Feature importance analysis revealed that Previously_Insured, Vehicle_Damage, and Age were the most influential factors affecting customer responses. These findings suggest that the combination of Random Forest and SMOTE provides an effective approach for handling imbalanced data and improving customer response prediction in vehicle insurance marketing campaigns.

Downloads

Download data is not yet available.

References

U. S. Sulistyawati and M. Munawir, “Decoding Big Data : Mengubah Data Menjadi Keunggulan Kompetitif dalam Pengambilan Keputusan Bisnis Abstrak,” J. Manaj. dan Teknol., vol. 1, no. 2, p. 14, 2024, doi: 10.63447/jmt.v1i2.1114.

N. K. Wahyu Utami and M. I. Padli Nasution, “Dampak Penerapan Big Data dalam Sistem Informasi Manajemen Terhadap Prediksi Tren dan Strategi Pemasaran Konsumen di Indonesia Tahun 2025,” Int. J. Islam. Bus. Manag., vol. 4, no. 6, p. 8, 2025.

M. Altalhan, A. Algarni, and M. T. Alouane, “Imbalanced Data Problem in Machine Learning : A Review,” IEEE Access, vol. 13, no. December 2024, pp. 13686–13699, 2025, doi: 10.1109/ACCESS.2025.3531662.

B. Van Giffen, D. Herhausen, and T. Fahse, “Overcoming the pitfalls and perils of algorithms : A classification of machine learning biases and mitigation methods,” J. Bus. Res., vol. 144, no. January, pp. 93–106, 2022, doi: 10.1016/j.jbusres.2022.01.076.

R. Graf, M. Zeldovich, and S. Friedrich, “Comparing linear discriminant analysis and supervised learning algorithms for binary classification — A method comparison study,” Biometrical J., no. March, p. 20, 2022, doi: 10.1002/bimj.202200098.

T. Wahyuningsih, D. Manongga, I. Sembiring, and S. Wijono, “Comparison of Effectiveness of Logistic Regression, Naive Bayes, and Random Forest Algorithms in Predicting Student Argument,” Procedia Comput. Sci., vol. 234, pp. 349–356, 2024, doi: 10.1016/j.procs.2024.03.014.

D. Dey et al., “The proper application of logistic regression model in complex survey data : a systematic review,” BMC Med. Res. Methodol., vol. 5, p. 18, 2025, doi: 10.1186/s12874-024-02454-5.

P. Das and D. A. S. Kironmala, “Machine Learning ‑ Based Rainfall Forecasting with Multiple Non ‑ Linear Feature Selection Algorithms,” Water Resour. Manag., no. October, p. 29, 2022, doi: 10.1007/s11269-022-03341-8.

E. Hendrawan, D. Zakaria, E. Salwa, and J. Heikal, “Customer Renewal Prediction for Motor Vehicle Insurance Using Binary Logistic Regression in PT XYZ Insurance,” Innov. J. Soc. Sci. Res., vol. 4, no. 6, p. 10, 2024, doi: 10.31004/innovative.v4i6.16478.

A. S. Honggowibowo, M. K. Nasrillah, D. Nugraheny, and N. D. Retnowati, “Sistem Rekomendasi Asuransi Mobil Berbasis Web dengan Pendekatan Weighted Product,” Indones. J. Comput. Sci., vol. 4, no. 1, p. 7, 2025, doi: 10.31294/3r2c2857.

Q. A. Siregar, R. Meliyani, H. Rahmah, M. H. Musito, M. Amelia, and C. Secu, “Prediksi Risiko Gagal Bayar Premi Menggunakan Algoritma Gradient Boosting: Studi Travel Insurance Prediction,” JUKOMTEK (Jurnal Komput. dan Teknol., vol. 4, no. 1, p. 4, 2024, doi: 10.64626/jukomtek.v3i2.565.

F. Khamesian, M. Esna-ashari, E. D. Ofosu-hene, and F. Khanizadeh, “Risk Classification of Imbalanced Data for Car Insurance Companies : Machine Learning Approaches,” Int. J. Math. Model. Comput., vol. 12, no. 03, p. 10, 2022, doi: 10.30495/ijm2c.2022.1958403.1252.

K. Ghosh, C. Bellinger, R. Corizzo, P. Branco, B. Krawczyk, and N. Japkowicz, “The class imbalance problem in deep learning,” Mach. Learn., vol. 113, no. 7, pp. 4845–4901, 2024, doi: 10.1007/s10994-022-06268-8.

S. Mohammad, H. Mirsadeghi, H. Bahsi, R. Vaarandi, and W. Inoubli, “Learning From Few Cyber-Attacks : Addressing the Class Imbalance Problem in Machine Learning-Based Intrusion Detection in Software-Defined Networking,” IEEE Access, vol. 11, no. December, 2023, doi: 10.1109/ACCESS.2023.3341755.

D. Dablain, B. Krawczyk, and N. V Chawla, “DeepSMOTE : Fusing Deep Learning and SMOTE for Imbalanced Data,” IEEE Trans. Neural Networks Learn. Syst., vol. 34, no. 9, p. 15, 2021, doi: 10.1109/TNNLS.2021.3136503.