Published on January 27, 2026

Performance of Transformer-Convolutional Neural Network Ensemble for Melanoma Diagnosis Using Segmented 3D Total Body Photography Data

Skin cancer remains one of the most prevalent cancers worldwide, with melanoma representing a significant proportion of newly diagnosed cases. The early detection of melanoma is critical, as timely intervention can dramatically improve patient survival rates. Traditional diagnostic techniques, including dermoscopy and histopathology, require trained personnel and specialized equipment, which are often inaccessible in low-resource settings. This limitation has spurred the adoption of artificial intelligence (AI) in medical imaging, particularly deep learning (DL) models that can analyze complex visual patterns in skin lesions.

A recent study by Dandamudi et al. (2025) investigated the use of a deep learning ensemble model combining ConvNeXt, ResNet, and Swin Transformer architectures to detect melanoma from segmented 3D total body photography (3D-TBP) data. This ensemble, evaluated using a Quadruple Stratified Leak-Free 5-Fold Cross-Validation (QSLF-KF-CV) approach, demonstrated a significant improvement in predictive performance compared to individual models, highlighting the potential of AI-based diagnostic systems for comprehensive skin cancer detection.

Advantages of 3D Total Body Photography

Unlike conventional dermoscopic imaging, which focuses on individual lesions deemed clinically suspicious, 3D-TBP captures all visible lesions across the entire body in a single session. This comprehensive approach mitigates biases inherent in dermoscopy datasets, where benign lesions are underrepresented. By systematically documenting all lesions, 3D-TBP provides a more balanced and clinically relevant dataset for training AI models. Additionally, the standardization of image acquisition across the whole body improves the generalizability of machine learning algorithms, reducing the impact of confounding factors such as lighting variation, skin tone differences, and lesion orientation.

Deep Learning Ensemble Architecture

The study utilized an ensemble of three state-of-the-art architectures: ConvNeXt-Base, ResNet-50, and Swin Transformer-Base, each initialized with ImageNet-1K pretrained weights. These models contribute complementary strengths. ConvNeXt excels at hierarchical feature extraction using convolutional layers, ResNet introduces residual learning to capture subtle patterns, and Swin Transformer leverages attention mechanisms for global context modeling. By integrating these architectures, the ensemble model can extract both fine-grained local features and broader contextual patterns, leading to improved classification performance.

Preprocessing and Data Augmentation

To enhance model robustness, extensive preprocessing and data augmentation were applied. Contrast Limited Adaptive Histogram Equalization (CLAHE) improved local contrast while limiting noise amplification. Images were normalized using ImageNet statistics to align with the pretrained models’ expectations. Data augmentation included rotations, flips, shifts, scaling, and color adjustments, as well as Gaussian noise and blurring effects, simulating real-world variability in imaging conditions. Such transformations prevent overfitting and improve the model’s ability to generalize across diverse datasets.

Handling Class Imbalance

The SLICE-3D TBP dataset used in the study included over 400,000 cropped lesion images, with only a small fraction being malignant. To address this imbalance, the researchers implemented the QSLF-KF-CV strategy. This cross-validation protocol ensured that each fold contained a balanced representation of benign and malignant lesions while preventing patient-level data leakage across folds. Additional considerations included accounting for the number of images per patient and lesion size distribution, ensuring an unbiased evaluation and robust model performance.

Training and Optimization

All models were trained using the Adam optimizer with weight decay to mitigate overfitting, an initial learning rate of 0.001, and a batch size of 32. A learning rate scheduler with warm-up and cosine decay phases ensured stable convergence, while early stopping prevented unnecessary overfitting. Each model was fine-tuned end-to-end on the SLICE-3D dataset, enabling the ensemble to capture domain-specific features essential for accurate melanoma detection.

Ensemble weights were optimized via grid search within the constraints of the cross-validation folds. The final ensemble weights assigned 40% importance to Swin Transformer, 35% to ConvNeXt, and 25% to ResNet, resulting in a final Area Under the Curve (AUC) of 0.9208. In comparison, individual models achieved lower AUCs: ConvNeXt at 0.8722, ResNet at 0.8551, and Swin Transformer at 0.8763. This improvement demonstrates that combining complementary architectures enhances predictive accuracy.

Performance Evaluation

The ensemble model's performance was evaluated using several metrics, including AUC, accuracy, specificity, and F1 score. The robust cross-validation procedure ensured that the model’s performance was not artificially inflated by overlapping patient data. Compared to conventional machine learning algorithms like XGBoost, LightGBM, and CatBoost, which achieved AUCs of 0.777, 0.719, and 0.749 respectively, the deep learning ensemble substantially outperformed traditional approaches, particularly for complex, high-dimensional image data.

The ensemble’s superior performance can be attributed to its ability to integrate local, residual, and global features, allowing it to capture diverse diagnostic patterns in melanoma lesions. While ResNet individually underperformed relative to the other models, its inclusion added complementary morphological information, enhancing the ensemble’s overall capability.

Limitations and Future Directions

Despite the high accuracy achieved, the model has limitations. Confounding factors such as lighting variations, camera orientation, and diverse skin tones can affect generalization. Although preprocessing and augmentation strategies mitigate these effects, standardized imaging protocols and more demographically diverse datasets are needed for further improvement. Future research should also focus on explainable AI (XAI) techniques to make model predictions interpretable for clinicians. Methods such as Grad-CAM for CNNs and attention visualization for transformers can highlight features aligned with clinical criteria, improving trust and facilitating clinical adoption.

Moreover, integrating telehealth applications with robust AI models could provide early melanoma detection in remote or underserved areas. Smartphone-based imaging introduces additional variability, but with rigorous data augmentation and preprocessing, AI systems can handle such input effectively. Enhancing interpretability and explainability is essential to ensure that predictions align with clinical reasoning and regulatory standards.

Clinical Implications

The findings from this study have significant implications for dermatology practice. By leveraging 3D-TBP datasets and deep learning ensembles, clinicians can obtain more accurate and automated melanoma diagnoses. This approach reduces reliance on highly specialized personnel for initial assessments, potentially accelerating diagnosis and intervention. Furthermore, as AI systems improve in accuracy and interpretability, they may serve as decision-support tools that complement rather than replace human expertise, ultimately enhancing patient outcomes.

Conclusion

The ensemble of ConvNeXt, ResNet, and Swin Transformer architectures demonstrates a robust and effective strategy for melanoma detection using segmented 3D-TBP data. The use of extensive preprocessing, data augmentation, and a rigorous cross-validation framework ensures reliable generalization, outperforming individual deep learning models and conventional machine learning algorithms. While limitations related to imaging variability and demographic diversity remain, future integration with explainable AI methods and broader datasets promises to further improve clinical applicability. This study highlights the potential of deep learning ensembles in medical imaging and emphasizes the importance of combining complementary architectures to enhance diagnostic accuracy.

References

Armstrong, B. K., & Kricker, A. (1995). The epidemiology of UV induced skin cancer. Journal of Photochemistry and Photobiology B: Biology, 39(1), 5-15.
Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115-118.
Jones, O., et al. (2019). Limitations of conventional dermoscopy in resource-limited settings. Journal of Dermatological Science, 95(3), 145-152.
Kashani-Sabet, M., et al. (2023). Advances in AI for skin cancer diagnosis. Journal of the American Academy of Dermatology, 88(1), 20-30.
Kurtansky, S., et al. (2024). SLICE-3D: A comprehensive 3D total body photography dataset for skin lesion analysis. Medical Image Analysis, 82, 102-115.
Mahdavi, M., et al. (2021). Electrical impedance spectroscopy for rapid cancer detection. Biosensors and Bioelectronics, 173, 112-119.
Pham, H., et al. (2021). Deep learning in dermatology: CNN applications for skin lesion classification. Computers in Biology and Medicine, 130, 104-123.
Selvaraju, R. R., et al. (2017). Grad-CAM: Visual explanations from deep networks. Proceedings of the IEEE International Conference on Computer Vision, 618-626.
Tschandl, P., et al. (2020). Human-computer collaboration for skin lesion analysis. Nature Medicine, 26(8), 1229-1234.
Wen, Q., et al. (2022). Bias in dermoscopic datasets: Underrepresentation of benign lesions. Journal of Biomedical Informatics, 125, 103-110.

Disclaimer

This blog is intended for informational purposes only and does not constitute medical advice. AI-based diagnostic tools are designed to assist clinicians and should not replace professional medical evaluation. Clinical decisions should always involve consultation with qualified healthcare professionals.

Share this post

Explore Related Articles for Deeper Insights

Why Immunotherapy Often Fails in Pancreatic Cancer: New Research Offers Fresh Hope

Pancreatic cancer remains one of the most difficult cancers to treat, and scientists have long strug...

View

FDA to Reevaluate Access to Certain Peptides in Compounded Medications: What It Means for Consumers

The U.S. Food and Drug Administration (FDA) is preparing to review whether certain peptides should b...

View

Could Your Child Have a Developmental Language Disorder? Early Signs Parents Should Know

Many parents celebrate the moment their child says their first word. But when speech seems delayed o...

View

Performance of Transformer-Convolutional Neural Network Ensemble for Melanoma Diagnosis Using Segmented 3D Total Body Photography Data

Advantages of 3D Total Body Photography

Deep Learning Ensemble Architecture

Preprocessing and Data Augmentation

Handling Class Imbalance

Training and Optimization

Performance Evaluation

Limitations and Future Directions

Clinical Implications

Conclusion

References

Disclaimer

Share this post

Explore Related Articles for Deeper Insights

To get more personalized answers,
download now

Company

Contact

Use cases

Resources

Term and Conditions

Performance of Transformer-Convolutional Neural Network Ensemble for Melanoma Diagnosis Using Segmented 3D Total Body Photography Data

Advantages of 3D Total Body Photography

Deep Learning Ensemble Architecture

Preprocessing and Data Augmentation

Handling Class Imbalance

Training and Optimization

Performance Evaluation

Limitations and Future Directions

Clinical Implications

Conclusion

References

Disclaimer

Share this post

Explore Related Articles for Deeper Insights

To get more personalized answers, download now

Company

Contact

Use cases

Resources

Term and Conditions

To get more personalized answers,
download now