Multiple-object tracking is a task within the field of computer vision. As the name stated, the task consists of tracking multiple objects in the video, an algorithm that completes such task are called trackers. Many of the existing trackers require supervision, meaning that the location and identity of each object which appears in the training data must be labeled. The procedure of generating these labels, usually through manual annotation of video material, is highly resource-consuming. On the other hand, different from well-known labeled Multiple-object tracking datasets, there exist a massive amount of unlabeled video with different objects, environments, and video specifications. Using such unlabeled video can therefore contribute to c...