Purpose: Real-time surgical tool tracking is a core component of the future intelligent operating room (OR), because it is highly instrumental to analyze and understand the surgical activities. Current methods for surgical tool tracking in videos need to be trained on data in which the spatial positions of the tools are manually annotated. Generating such training data is difficult and time-consuming. Instead, we propose to use solely binary presence annotations to train a tool tracker for laparoscopic videos.Methods: The proposed approach is composed of a CNN + Convolutional LSTM (ConvLSTM) neural network trained end to end, but weakly supervised on tool binary presence labels only. We use the ConvLSTM to model the temporal dependencies in...