Visualization of RoMa estimated warps between large-baseline image pairs.

The animation is done by by parameterizing \(W(t) = (1-t) x^{\mathcal{A}} + t \hat{x}^{\mathcal{B}}\) .

The animation is done by by parameterizing \(W(t) = (1-t) x^{\mathcal{A}} + t \hat{x}^{\mathcal{B}}\) .

We propose a significantly more robust dense feature matcher than previous approaches.

Our approach consists of four main contributions (see figure above):

- We use a foundation model (DINOv2) instead of training from scratch, leading to more robust matches.
- We use a specialized ConvNet for fine features.
- We propose a Transformer match decoder that predicts anchor probabilities instead of coordinate regression.
- We use better losses.

```
@article{edstedt2023roma,
title={{RoMa: Robust Dense Feature Matching}},
author={Edstedt, Johan and Sun, Qiyu and Bökman, Georg and Wadenbäck, Mårten and Felsberg, Michael},
journal={arXiv preprint arXiv:2305.15404},
year={2023}
}
```