DINO-Foresight: Looking into the Future with DINO (NeurIPS 2025)

image

DINO-Foresight is a novel framework that operates in the semantic feature space of pretrained Vision Foundation Models (VFMs) to predict future dynamics. It trains a masked feature transformer in a self-supervised manner to forecast the evolution of VFM features over time, enabling various scene understanding tasks through off-the-shelf, task-specific heads.

Paper

DINO-Foresight: Looking into the Future with DINO

Project Page

https://dino-foresight.github.io

Code

The official implementation can be found on GitHub: https://github.com/Sta8is/DINO-Foresight

Sample Usage

The model is built with PyTorch. You can set up the environment and install dependencies as follows:

conda create -n dinof python=3.11
conda activate dinof
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu121   
git clone https://github.com/Sta8is/DINO-Foresight
cd DINO-Foresight
pip install -r requirements.txt

For detailed usage examples and model training, refer to the provided demos and the GitHub repository.

Demo

We provide 2 quick demos.

Citation

If you found DINO-Foresight useful in your research, please consider starring โญ us on GitHub and citing ๐Ÿ“š us in your research!

@inproceedings{karypidis2025dinoforesight,
title={{DINO}-Foresight: Looking into the Future with {DINO}},
author={Efstathios Karypidis and Ioannis Kakogeorgiou and Spyros Gidaris and Nikos Komodakis},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://arxiv.org/abs/2412.11673}
}

Acknowledgements

Our code is partially based on Maskgit-pytorch, a pytorch implementation of MaskedGit by ValeoAI. We also thank authors of DINOv2, DPT, DepthAnythingV2, LOTUS for their work and open-source code.

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support