A Multi-Dataset Benchmark of Multiple Instance Learning for 3D Neuroimage Classification
Ethan Harvey, Dennis Johan Loevlie, Amir Ali Satani, Wansu Chen, David M. Kent, Michael C. Hughes
Problem
This preprint addresses the gap in the literature regarding the efficiency and effectiveness of multiple instance learning (MIL) methods compared to traditional 3D convolutional neural networks (CNNs) for classifying neuroimaging data. While 3D CNNs have been the standard for CT and MRI scans, they are resource-intensive to train. The authors aim to provide a systematic evaluation of various MIL approaches against 3D CNNs and 3D Vision Transformers (ViTs) across multiple datasets, thereby aiding resource-constrained practitioners in selecting appropriate models for 3D neuroimage classification.
Method
The authors conduct a comprehensive benchmark involving simple MIL, attention-based MIL, 3D CNNs, and 3D ViTs across three CT and four MRI datasets, including two large datasets with over 10,000 scans each. The study evaluates different design choices for attention-based MIL, such as encoder types, pooling operations, and architectural configurations. The core technical contribution is the demonstration that simple mean pooling in MIL, without learnable attention mechanisms, can match or outperform more complex models on several tasks. The training process is optimized for speed, achieving a training time reduction of 25x compared to 3D CNNs. The authors also analyze per-slice attention quality and utilize a semi-synthetic dataset to derive a Bayes optimal classifier, providing insights into the limitations of existing MIL methods.
Results
The results indicate that simple mean pooling MIL outperforms or matches the performance of recent MIL and 3D CNN alternatives on 4 out of 6 moderate-sized tasks. On the two large datasets, the mean pooling approach remains competitive, achieving a significant reduction in training time. Specific performance metrics are not disclosed in the abstract, but the authors emphasize the effectiveness of their approach in practical scenarios, particularly for resource-constrained environments.
Limitations
The authors acknowledge that their analysis is limited to the datasets used in the study and that the performance of the proposed methods may vary with different types of neuroimaging data or clinical tasks. They also note that while mean pooling is effective, it may not capture all relevant features compared to more complex attention mechanisms. Additionally, the study does not explore the generalizability of the findings across other domains outside neuroimaging.
Why it matters
This work has significant implications for the field of medical imaging, particularly in enhancing the accessibility of advanced classification techniques for practitioners with limited computational resources. By demonstrating that simpler models can achieve competitive performance, the study encourages further exploration of efficient architectures in deep learning for medical applications. The insights gained from the analysis of attention quality and the proposed routes for future improvements could guide subsequent research in refining MIL approaches and developing more effective classifiers for neuroimaging tasks.
Authors: Ethan Harvey, Dennis Johan Loevlie, Amir Ali Satani, Wansu Chen, David M. Kent, Michael C. Hughes
Source: arXiv:2604.26807
URL: https://arxiv.org/abs/2604.26807v1