Skip to content

Official repository of the paper "MIRAGE: A multimodal foundation model and benchmark for comprehensive retinal OCT image analysis".

License

Notifications You must be signed in to change notification settings

j-morano/MIRAGE

Repository files navigation

MIRAGE

This repository contains the code for the paper "MIRAGE: A multimodal foundation model and benchmark for comprehensive retinal OCT image analysis", submitted to npj Digital Medicine.

MIRAGE is a multimodal foundation model for comprehensive retinal OCT/SLO image analysis. It is trained on a large-scale dataset of multimodal data, and is designed to perform a wide range of tasks, including disease staging, diagnosis, and layer and lesion segmentation. MIRAGE is based on the MultiMAE architecture, and is pre-trained using a multi-task learning strategy. The model, based on ViT, is available in two sizes: MIRAGE-Base and MIRAGE-Large. The code in this repository provides the model weights and the code to run inference.

Important

All scripts and code are intended to run on Linux systems.

Overview

Overview

Overview of the proposed model (MIRAGE) and other general (DINOv2) and domain-specific (MedSAM, RETFound) foundation models. In contrast to existing unimodal foundation models, our approach utilizes multimodal self-supervised learning to train a Vision Transformer on a large dataset of paired multimodal retinal images, including optical coherence tomography (OCT), scanning laser ophthalmoscopy (SLO), and automatically generated labels for retinal layers. We evaluated the model on a comprehensive benchmark consisting of 19 tasks from 14 publicly available datasets and two private datasets, covering both OCT and SLO classification and segmentation tasks. Statistical significance was calculated using the Wilcoxon signed-rank test across all datasets. Our foundation model, MIRAGE, significantly outperforms state-of-the-art foundation models across all task types.

TODO

  • Basic code to load the model and run inference
  • Model weights
  • Downstream classification datasets
  • Downstream segmentation datasets
  • Classification tuning code
  • Classification evaluation code
  • Segmentation tuning code
  • Segmentation evaluation code
  • Detailed documentation
  • Quick start script
  • Pretraining code

Quick start

For a quick start, use the provided script prepare_env.py to create a new python environment, install the required packages, and download the model weights and the datasets.

Important

The script will download the model weights and the datasets, which are large files. Make sure you have enough disk space and a stable internet connection.

In addition, it will install Python 3.10.16 (from source) in the same folder if it detects that the system Python version is not 3.10.*.

./prepare_env.py

Tip

Run the script with the -h or --help flag to see the available options.

Requirements

Note

The code has been tested with PyTorch 2.5.1 (CUDA 11.8) and Python 3.10.10.

pip

Create a new python environment and activate it:

python -m venv venv  # if not already created
source venv/bin/activate

Install the required packages:

pip install -r requirements.txt

Model weights

The model weights are available in the Model weights release on GitHub.

Model Link
MIRAGE-Base Weights
MIRAGE-Large Weights

Inference

The script mirage_wrapper.py provides a simple pipeline to load the model and run inference on a single sample. This sample is already included in the repository (_example_images/) and consists of a triplet of OCT, SLO, and layer segmentation images.

To run the inference, simply execute the script:

python mirage_wrapper.py

Check the code for more details.

Evaluation benchmark

We provide all the publicly available datasets used in the benchmark with the data splits. See docs/segmentation_benchmark.md for more details on the segmentation benchmark, and docs/classification_benchmark.md for the classification benchmark.

Pretraining

Although we do not provide the pretraining data due to privacy concerns, we provide the code to pretrain MIRAGE on a multimodal dataset. Please check the docs/pretraining.md for more details.

Tuning

We provide the code to fine-tune MIRAGE and other state-of-the-art foundation models for OCT segmentation tasks. Please check the docs/segmentation_tuning.md for more details.

We also provide the code to fine-tune the models for OCT and SLO classification tasks. More information can be found in the docs/classification_tuning.md file.

Questions and issues

If you have any questions or find problems with the code, please open an issue on GitHub.

Citation

If you find this repository useful, please consider giving it a star ⭐ and a citation 📝:

@article{morano2025mirage,
  title={{MIRAGE}: A multimodal foundation model and benchmark for comprehensive retinal {OCT} image analysis},
  author={José Morano
  and Botond Fazekas
  and Emese Sükei
  and Ronald Fecso
  and Taha Emre
  and Markus Gumpinger
  and Georg Faustmann
  and Marzieh Oghbaie
  and Ursula Schmidt-Erfurth
  and Hrvoje Bogunović},
  journal={Preprint},
  year={2025}
}

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Acknowledgements

MIRAGE code is mainly based on MultiMAE, along with timm, DeiT, DINO, MoCo-v3, BEiT, MAE-priv, MAE, mmsegmentation, MONAI, and RETFound. We thank the authors for making their code available.