Episode 34

full
Published on:

10th Jun 2026

A Comparative Study of Principal Component Analysis with Ensemble Learning for Classification of Medical Data

Abstract

Dimensionality reduction is a critical component in the analysis of medical data, specifically when addressing challenges like multicollinearity, noise, and high-dimensional feature spaces that can decrease classification performance. While principal component analysis (PCA) is a traditional choice, its utility in medical datasets is often hindered by outliers, corrupted observations, and low interpretability, as principal components are linear combinations of all original variables. This research compares PCA, robust PCA (RPCA), and sparse PCA (SPCA) integrated with random forest (RF) and extremely randomized trees (ERT). A simulation study revealed that while all PCA variants struggle with low class separation, RPCA and SPCA significantly outperform standard PCA in the presence of outliers. This study utilized a diabetes dataset that underwent thorough preprocessing, including median imputation, normalization, and the synthetic minority over-sampling technique (SMOTE) to address class imbalance. Model optimization involved cross-validation of the RPCA regularization parameter and the SPCA sparsity parameter based on the area under the receiver operating characteristic (ROC) curve (AUC). At the same time, RF and ERT hyperparameters were optimized using a two-stage random and grid search approach. Final empirical results demonstrate that the RPCA-ERT model is superior, achieving an accuracy of 0.8954 and a sensitivity of 0.9434, underscoring its effectiveness in managing contaminated medical data.

Show artwork for BIAR BUKU BICARA

About the Podcast

BIAR BUKU BICARA
USIM Journals are offering podcasting services for articles published in USIM journals. We will convert your article into 7-10 minute audiobook covering scientific research in a layman friendly language.

Let’s create a podcast for your research article and turn your findings into an engaging and accessible overview that’s perfect for sharing on websites and social media – extending the reach and visibility of your research.