Take a shot! Natural language control of intelligent robotic X-ray systems in surgery

Benjamin D Killeen; Shreayan Chaudhary; Greg Osgood; Mathias Unberath

doi:10.1007/s11548-024-03120-3

Take a shot! Natural language control of intelligent robotic X-ray systems in surgery

Int J Comput Assist Radiol Surg. 2024 Apr 15. doi: 10.1007/s11548-024-03120-3. Online ahead of print.

Authors

Benjamin D Killeen¹, Shreayan Chaudhary², Greg Osgood³, Mathias Unberath²

Affiliations

¹ Laboratory for Computational Sensing and Robotics, Johns Hopkins University, Baltimore, MD, 21218, USA. killeen@jhu.edu.
² Laboratory for Computational Sensing and Robotics, Johns Hopkins University, Baltimore, MD, 21218, USA.
³ Department of Orthopaedic Surgery, Johns Hopkins University, Baltimore, MD, 212187, USA.

PMID: 38619790
DOI: 10.1007/s11548-024-03120-3

Abstract

Purpose: The expanding capabilities of surgical systems bring with them increasing complexity in the interfaces that humans use to control them. Robotic C-arm X-ray imaging systems, for instance, often require manipulation of independent axes via joysticks, while higher-level control options hide inside device-specific menus. The complexity of these interfaces hinder "ready-to-hand" use of high-level functions. Natural language offers a flexible, familiar interface for surgeons to express their desired outcome rather than remembering the steps necessary to achieve it, enabling direct access to task-aware, patient-specific C-arm functionality.

Methods: We present an English language voice interface for controlling a robotic X-ray imaging system with task-aware functions for pelvic trauma surgery. Our fully integrated system uses a large language model (LLM) to convert natural spoken commands into machine-readable instructions, enabling low-level commands like "Tilt back a bit," to increase the angular tilt or patient-specific directions like, "Go to the obturator oblique view of the right ramus," based on automated image analysis.

Results: We evaluate our system with 212 prompts provided by an attending physician, in which the system performed satisfactory actions 97% of the time. To test the fully integrated system, we conduct a real-time study in which an attending physician placed orthopedic hardware along desired trajectories through an anthropomorphic phantom, interacting solely with an X-ray system via voice.

Conclusion: Voice interfaces offer a convenient, flexible way for surgeons to manipulate C-arms based on desired outcomes rather than device-specific processes. As LLMs grow increasingly capable, so too will their applications in supporting higher-level interactions with surgical assistance systems.

Keywords: Autonomous imaging; Image-guided surgery; Large language models; Machine learning; Speech-to-text.

Abstract

Grants and funding