Intra-operative 2D-3D registration of X-ray images with pre-operatively acquired CT scans is a crucial procedure in orthopedic surgeries. Anatomical landmarks pre-annotated in the CT volume can be detected in X-ray images to establish 2D-3D correspondences, which are then utilized for registration. However, registration often fails in certain view angles due to poor landmark visibility. We propose a novel method to address this issue by detecting arbitrary landmark points in X-ray images. Our approach represents 3D points as distinct subspaces, formed by feature vectors (referred to as ray embeddings) corresponding to intersecting rays. Establishing 2D-3D correspondences then becomes a task of finding ray embeddings that are close to a given subspace, essentially performing an intersection test. Unlike conventional methods for landmark estimation, our approach eliminates the need for manually annotating fixed landmarks. We trained our model using the synthetic images generated from CTPelvic1K CLINIC dataset, which contains 103 CT volumes, and evaluated it on the DeepFluoro dataset, comprising real X-ray images. Experimental results demonstrate the superiority of our method over conventional methods
Estimating correspondences for arbitrary landmark points between 2D and 3D representations poses significant challenges due to the mismatch in spatial dimensions and the transmissive properties of X-rays, which can associate multiple 3D points with a single 2D projection. To address this, we employ a pixel-wise feature extractor and pre-render DRR templates to represent a 3D point as a subspace. The pixel-wise features are referred to as ray embeddings, as we associate the feature vector to its back-projected ray. Since a 3D point can be represented by a collection of intersecting rays, a set of ray embeddings can be associated with a 3D point if their underlying rays intersect. The 2D projection of a 3D point can then be identified by evaluating the closeness of the ray embeddings in the query image to the subspace representing the 3D point
Once the correspondences are established, the perspectiven-point algorithm [1] with marginalizing sample and consensus (MAGSAC) [2] is used to obtain the initial pose estimate. This estimate serves as initialization for DiffDRR [3], a gradient-based optimization refinement module. The core idea is to provide pose estimator with large amount of corresponding pairs of 3D landmarks and its 2D projections.
[1] Lu, X.X.: A review of solutions for perspective-n-point problem in camera pose
estimation. J. Phys. Conf. Ser. 1087(5), 052009 (Sep 2018).
[2] Barath, D., Matas, J., Noskova, J.: MAGSAC: Marginalizing sample consensus.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. pp. 10197-10205 (2019).
[3] Gopalakrishnan, V., Golland, P.: Fast auto-differentiable digitally reconstructed
radiographs for solving inverse problems in intraoperative imaging. In: Clinical
Image-Based Procedures, pp. 1-11. Lecture notes in computer science, Springer
Nature Switzerland, Cham (2023).
@InProceedings{Shrestha_2024_ACCV,
author = {Shrestha, Pragyan and Xie, Chun and Yoshii, Yuichi and Kitahara, Itaru},
title = {RayEmb: Arbitrary Landmark Detection in X-Ray Images Using Ray Embedding Subspace},
booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)},
month = {December},
year = {2024},
pages = {665-681}
}