Automatic surgical scene segmentation is fundamental for facilitating cognitive intelligence in the modern operating theatre. Previous works rely on conventional aggregation modules (e.g., dilated convolution, convolutional LSTM), which only make use of the local context. In this paper, we propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance, by progressively capturing the global context. We firstly develop a hierarchy Transformer to capture intra-video relation that includes richer spatial and temporal cues from neighbor pixels and previous frames. A joint space-time window shift scheme is proposed to efficiently aggregate these two cues into each pixel embedd...
Several approaches have been introduced to understand surgical scenes through downstream tasks like ...
Purpose: Surgery scene understanding with tool-tissue interaction recognition and automatic report g...
Surgical videos captured from microscopic or endoscopic imaging devices are rich but complex sources...
Surgical context inference has recently garnered significant attention in robot-assisted surgery as ...
In the medical field, due to their economic and clinical benefits, there is a growing interest in mi...
Recent advancements in surgical computer vision applications have been driven by fully-supervised me...
Producing manual, pixel-accurate, image segmentation labels is tedious and time-consuming. This is o...
Automated video-based assessment of surgical skills is a promising task in assisting young surgical ...
Current semantic segmentation methods focus only on mining “local” context, i.e., dependencies betwe...
A major obstacle to building models for effective semantic segmentation, and particularly video sema...
PURPOSE: Surgical workflow estimation techniques aim to divide a surgical video into temporal segmen...
Recent advancements in surgical computer vision applications have been driven by fully-supervised me...
Self-supervised learning has witnessed great progress in vision and NLP; recently, it also attracted...
This paper presents a deep learning framework for medical video segmentation. Convolution neural net...
Automatic surgical phase recognition plays a vital role in robot-assisted surgeries. Existing method...
Several approaches have been introduced to understand surgical scenes through downstream tasks like ...
Purpose: Surgery scene understanding with tool-tissue interaction recognition and automatic report g...
Surgical videos captured from microscopic or endoscopic imaging devices are rich but complex sources...
Surgical context inference has recently garnered significant attention in robot-assisted surgery as ...
In the medical field, due to their economic and clinical benefits, there is a growing interest in mi...
Recent advancements in surgical computer vision applications have been driven by fully-supervised me...
Producing manual, pixel-accurate, image segmentation labels is tedious and time-consuming. This is o...
Automated video-based assessment of surgical skills is a promising task in assisting young surgical ...
Current semantic segmentation methods focus only on mining “local” context, i.e., dependencies betwe...
A major obstacle to building models for effective semantic segmentation, and particularly video sema...
PURPOSE: Surgical workflow estimation techniques aim to divide a surgical video into temporal segmen...
Recent advancements in surgical computer vision applications have been driven by fully-supervised me...
Self-supervised learning has witnessed great progress in vision and NLP; recently, it also attracted...
This paper presents a deep learning framework for medical video segmentation. Convolution neural net...
Automatic surgical phase recognition plays a vital role in robot-assisted surgeries. Existing method...
Several approaches have been introduced to understand surgical scenes through downstream tasks like ...
Purpose: Surgery scene understanding with tool-tissue interaction recognition and automatic report g...
Surgical videos captured from microscopic or endoscopic imaging devices are rich but complex sources...