SLZY -- Evaluation Report

All evaluations use scan-level metrics on the eval split (80/20 patient-stratified). Metrics may have high variance due to small dataset.

Models Evaluated

4 tabular + 3 deep learning

0.874

Best Scan F1

hist_gradient_boosting (fixed)

Eval Scans

scan-level evaluation

Fixed

Split Mode

80/20 patient-stratified

Leaderboard

All models ranked by scan-level macro F1. Green = strong, bold = best per metric. Overfit gap = train F1 minus eval F1 (high gap signals overfitting).

Model	Bal. Acc	F1	ROC AUC	PR AUC	Prec	Recall	Spec	Overfit Gap
hist_gradient_boosting (fixed) tabular	0.912	0.874	0.970	0.991	0.977	0.896	0.929	--
random_forest (fixed) tabular	0.891	0.838	0.958	0.987	0.976	0.854	0.929	--
xgboost (fixed) tabular	0.866	0.832	0.949	0.984	0.955	0.875	0.857	--
lightgbm (fixed) tabular	0.881	0.821	0.961	0.989	0.976	0.833	0.929	--
hierarchical_mil_binary_fixed dl	0.808	0.712	0.808	0.909	0.971	0.688	0.929	+0.216
direct4ch_binary_fixed dl	0.717	0.690	0.717	0.861	0.884	0.792	0.643	+0.203
hybrid_fusion_binary_fixed dl	0.688	0.514	0.688	0.859	1.000	0.375	1.000	+0.189

Model Comparison

Diagnostic curves comparing all models at scan level. ROC AUC measures discrimination, PR AUC is more informative under class imbalance.

ROC and Precision-Recall Curves

Confusion Matrices

Calibration & Threshold Analysis

Training Dynamics

DL model training curves. Early stopping triggered = model converged before max epochs.

Convergence Summary

model	final_train_loss	best_val_f1	best_epoch	total_epochs	max_epochs	early_stopped	last_is_best
direct4ch_binary_fixed	0.256	NaN	NaN	13	25	True	False
hierarchical_mil_binary_fixed	0.211	NaN	NaN	12	25	True	False
hybrid_fusion_binary_fixed	0.394	NaN	NaN	7	20	True	False

Attention Analysis

MIL attention weights reveal which channels the model relies on. Higher attention = more influence on the prediction.

channel_attention_hierarchical_mil_binary_fixed

attention_dist_hierarchical_mil_binary_fixed

Channel Importance

channel	mean_attention	std	max
height_sensor	0.2609	0.0447	0.3951
height	0.2580	0.0458	0.3746
amplitude_error	0.2481	0.0437	0.3564
phase	0.2330	0.0664	0.3498