Perbandingan Logistic Regression, Random Forest, dan Perceptron pada Klasifikasi Pasien Gagal Jantung
Main Article Content
Abstract
According to WHO (World Health Organization) data, heart disease accounts for one-third of all fatalities worldwide. Heart failure causes the death of approximately 17.9 million people worldwide and is more prevalent in Asia. With the use of technology, biostatistical analysis data may be processed using data mining techniques to uncover correlation patterns for each data from previous data that can then be used to forecast diseases based on these patterns. Algorithms with high accuracy can at least help medical experts prevent heart failure deaths. The ultimate goal of this research is to compare the algorithms and methods of Standard Scalers and robust scalers on datasets that have implemented the SMOTE method and without SMOTE to find algorithms and techniques with the highest accuracy and performance from the best algorithms using ROC curve values and AUC values. In this study, researchers will use Logistic Regression, Random Forest, and Perceptron. Comparing the Standard Scaler and Robust Scaler methods as data standardization. And using the SMOTE technique and not using SMOTE to overcome unbalanced classes. As a result, the Random Forest algorithm with SMOTE technique and Standard Scaler is suitable for classifying and predicting heart failure.
Article Details
References
P. Ponikowski et al., “Heart failure: preventing disease and death worldwide,” 2014, doi: 10.1002/ehf2.12005.
W. Nugraha, “Prediksi penyakit jantung cardiovascular menggunakan model algoritma klasifikasi,” Jurnal Sigmata, vol. 9, no. 2, pp. 78–84, 2021.
P. D. Putra and D. P. Rini, “Prediksi Penyakit Jantung dengan Algoritma Klasifikasi,” in Prosiding Annual Research Seminar, 2019, vol. 5, no. 1. [Online]. Available: http://archive.ics.uci.edu/ml/machine-learning-databases/
S. Rahayu, J. Jaya Purnama, A. Baroqah Pohan, F. Septia Nugraha, S. Nurdiani, and S. Hadianti, “Prediction Of Survival of Heart Failure Patients Using Random Forest,” 2020. [Online]. Available: www.ubsi.ac.id
F. Novaldy and A. Herliana, “Penerapan Pso Pada Naïve Bayes Untuk Prediksi Harapan Hidup Pasien Gagal Jantung,” Jurnal Responsif : Riset Sains dan Informatika, vol. 3, no. 1, pp. 37–43, 2021, doi: 10.51977/jti.v3i1.396.
D. Kleyko, A. Rosato, E. P. Frady, M. Panella, and F. T. Sommer, “Perceptron Theory for Predicting the Accuracy of Neural Networks,” Dec. 2020, [Online]. Available: http://arxiv.org/abs/2012.07881
J. Wang, “Heart failure prediction with machine learning: A comparative study,” in Journal of Physics: Conference Series, Sep. 2021, vol. 2031, no. 1. doi: 10.1088/1742-6596/2031/1/012068.
A. F. Djollong, “Tehnik Pelaksanaan Penelitian Kuantitatif,” Jurnal UM Parepare, vol. 2, no. 1, pp. 86–100, 2014.
R. Khan, “Importance of Datasets in Machine Learning and AI Research,” May 13, 2020. https://www.datatobiz.com/blog/datasets-in-machine-learning/ (accessed Sep. 29, 2022).
A. Y. Triyanto and R. Kusumaningrum, “Implementasi Teknik Sampling untuk Mengatasi Imbalanced Data pada Penentuan Status Gizi Balita dengan Menggunakan Learning Vector Quantization Implementation of Sampling Techniques for Solving Imbalanced Data Problem in Determination of Toddler Nutritional Status using Learning Vector Quantization,” vol. 19, pp. 39–50, 2017.
R. Siringoringo, “Klasifikasi Data Tidak Seimbang Menggunakan Algoritma Smote Dan K-nearest Neighbor,” 2018.
A. Kharwal, “StandardScaler in Machine Learning,” Sep. 22, 2020. https://thecleverprogrammer.com/2020/09/22/standardscaler-in-machine-learning/ (accessed Sep. 28, 2022).
J. Hale, “Scale, Standardize, or Normalize with Scikit-Learn,” Mar. 04, 2019. https://towardsdatascience.com/scale-standardize-or-normalize-with-scikit-learn-6ccc7d176a02 (accessed Jul. 10, 2022).
N. Lusty et al., “Model Regresi Logistik Untuk Melihat Pengaruh Faktor Demografis, Self Efficacy, Terhadap Perilaku Mencontek,” 2017.
K. Grover, “Advantages and Disadvantages of Logistic Regression.” https://iq.opengenus.org/advantages-and-disadvantages-of-logistic-regression/ (accessed Sep. 28, 2022).
P. Pareek, “Logistic Regression: Essential Things to Know | by Praveen Pareek | DataDrivenInvestor,” Sep. 02, 2021. https://medium.datadriveninvestor.com/logistic-regression-essential-things-to-know-a4fe0bb8d10a (accessed Sep. 28, 2022).
Y. Adriani Tampil, H. Komalig, and Y. Langi, “Analisis Regresi Logistik Untuk Menentukan Faktor-Faktor Yang Mempengaruhi Indeks Prestasi Kumulatif (IPK) Mahasiswa FMIPA Universitas Sam Ratulangi Manado,” 2017.
J. Hoare, “What is a Random Forest? - Displayr.” https://www.displayr.com/what-is-a-random-forest/ (accessed Sep. 29, 2022).
F. Yulian Pamuji, V. Puspaning Ramadhan, and R. Artikel, “Jurnal Teknologi dan Manajemen Informatika Komparasi Algoritma Random Forest Dan Decision Tree Untuk Memprediksi Keberhasilan Immunotheraphy Info Artikel ABSTRAK,” vol. 7, pp. 46–50, 2021, [Online]. Available: http://http://jurnal.unmer.ac.id/index.php/jtmi
S. Rawat, “Introduction to Perceptron Model in Machine Learning,” 2021. https://www.analyticssteps.com/blogs/introduction-perceptron-model-machine-learning (accessed May 23, 2022).
M. Yanto, R. Sovia, and E. P. W. Mandala, “Jaringan Syaraf Tiruan Perceptron Untuk Penentuan Pola Sistem Irigasi Lahan Pertanian Di Kabupaten Pesisir Selatan Sumatra Barat”.
J. Brownlee, “Perceptron Algorithm for Classification in Python,” Dec. 11, 2020. https://machinelearningmastery.com/perceptron-algorithm-for-classification-in-python/ (accessed Oct. 11, 2022).
R. Arthana, “Mengenal Accuracy, Precision, Recall dan Specificity serta yang diprioritaskan dalam Machine Learning ,” Apr. 05, 2019. https://rey1024.medium.com/mengenal-accuracy-precission-recall-dan-specificity-serta-yang-diprioritaskan-b79ff4d77de8 (accessed Jul. 10, 2022).
B. Ramsay, A. Ralescu, E. van der Knaap, and S. Visa, “Confusion Matrix-based Feature Selection. Confusion Matrix-based Feature Selection,” 2011. [Online]. Available: https://www.researchgate.net/publication/220833270
J. Brownlee, “What is a Confusion Matrix in Machine Learning,” Aug. 15, 2020. https://machinelearningmastery.com/confusion-matrix-machine-learning/ (accessed Sep. 29, 2022).
S. Ekelund, “ROC curves – what are they and how are they used?,” Jan. 2011. https://acutecaretesting.org/en/articles/roc-curves-what-are-they-and-how-are-they-used (accessed May 29, 2022).
K. Hajian-Tilaki, “Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation,” 2018. [Online]. Available: https://www.researchgate.net/publication/256453215
Z. H. Hoo, J. Candlish, and D. Teare, “What is an ROC curve?,” Emergency Medicine Journal, vol. 34, no. 6, pp. 357–359, Jun. 2017, doi: 10.1136/emermed-2017-206735.
K. H. Zou, A. J. O’Malley, and L. Mauri, “Receiver-Operating Characteristic Analysis for Evaluating Diagnostic Tests and Predictive Models,” Circulation, vol. 115, no. 5, pp. 654–657, Feb. 2007, doi: 10.1161/CIRCULATIONAHA.105.594929.
A. Bhandari, “AUC-ROC Curve in Machine Learning Clearly Explained - Analytics Vidhya,” Jun. 16, 2020. https://www.analyticsvidhya.com/blog/2020/06/auc-roc-curve-machine-learning/ (accessed Sep. 29, 2022).