SLZY Evaluation Report

AFM Tear Classification — Scan-Level Evaluation

Generated 2026-04-19 09:13 • Hack Kosice 2026

All evaluations use scan-level metrics on the eval split (80/20 patient-stratified). Metrics may have high variance due to small dataset.
7
Models Evaluated
4 tabular + 3 deep learning
0.874
Best Scan F1
hist_gradient_boosting (fixed)
62
Eval Scans
scan-level evaluation
Fixed
Split Mode
80/20 patient-stratified

Leaderboard

All models ranked by scan-level macro F1. Green = strong, bold = best per metric. Overfit gap = train F1 minus eval F1 (high gap signals overfitting).

ModelBal. AccF1ROC AUCPR AUCPrecRecallSpecOverfit Gap
hist_gradient_boosting (fixed) tabular0.9120.8740.9700.9910.9770.8960.929--
random_forest (fixed) tabular0.8910.8380.9580.9870.9760.8540.929--
xgboost (fixed) tabular0.8660.8320.9490.9840.9550.8750.857--
lightgbm (fixed) tabular0.8810.8210.9610.9890.9760.8330.929--
hierarchical_mil_binary_fixed dl0.8080.7120.8080.9090.9710.6880.929+0.216
direct4ch_binary_fixed dl0.7170.6900.7170.8610.8840.7920.643+0.203
hybrid_fusion_binary_fixed dl0.6880.5140.6880.8591.0000.3751.000+0.189

Model Comparison

Diagnostic curves comparing all models at scan level. ROC AUC measures discrimination, PR AUC is more informative under class imbalance.

ROC and Precision-Recall Curves

ROC Curves
PR Curves

Confusion Matrices

Confusion Matrices

Calibration & Threshold Analysis

Calibration
Threshold Analysis

Training Dynamics

DL model training curves. Early stopping triggered = model converged before max epochs.

Training Loss
Training F1

Convergence Summary

model final_train_loss best_val_f1 best_epoch total_epochs max_epochs early_stopped last_is_best final_lr
direct4ch_binary_fixed 0.256 NaN NaN 13 25 True False 0.000
hierarchical_mil_binary_fixed 0.211 NaN NaN 12 25 True False 0.000
hybrid_fusion_binary_fixed 0.394 NaN NaN 7 20 True False 0.000

Attention Analysis

MIL attention weights reveal which channels the model relies on. Higher attention = more influence on the prediction.

channel_attention_hierarchical_mil_binary_fixed
attention_dist_hierarchical_mil_binary_fixed

Channel Importance

channel mean_attention std min max
height_sensor 0.2609 0.0447 0.0000 0.3951
height 0.2580 0.0458 0.0000 0.3746
amplitude_error 0.2481 0.0437 0.0000 0.3564
phase 0.2330 0.0664 0.0000 0.3498