RoMa: Robust Dense Feature Matching

1Linköping University    2East China University of Science and Technology    3Chalmers University of Technology   
Visualization of RoMa estimated warps between large-baseline image pairs.
The animation is done by by parameterizing \(W(t) = (1-t) x^{\mathcal{A}} + t \hat{x}^{\mathcal{B}}\) .


We propose a significantly more robust dense feature matcher than previous approaches.


Our approach consists of four main contributions (see figure above):

  1. We use a foundation model (DINOv2) instead of training from scratch, leading to more robust matches.
  2. We use a specialized ConvNet for fine features.
  3. We propose a Transformer match decoder that predicts anchor probabilities instead of coordinate regression.
  4. We use better losses.


        title={{RoMa: Robust Dense Feature Matching}},
        author={Edstedt, Johan and Sun, Qiyu and Bökman, Georg and Wadenbäck, Mårten and Felsberg, Michael},
        journal={arXiv preprint arXiv:2305.15404},