This repository is the official implementation of SiamABC, a single object tracker designed for efficiently tracking under adverse visibility conditions.
The Feature Extraction Block uses a readily available backbone to process the frames. The RelationAware Block exploits representational relations among the dual-template and dual-search-region through our losses, where dual-template and dual-search-region representations are obtained via our learnable FMF layer. The Heads Block learns lightweight convolution layers to infer the bounding box and the classification score through standard tracking losses. During inference, the tracker adapts to every instance through our Dynamic Test-Time Adaptation framework.
The training code is tested on Linux systems.
conda create -n SiamABC python=3.7
conda activate SiamABC
pip install -r requirements.txt
The SiamABC models are available in the assets
folder. Run the following code:
python realtime_test.py --initial_bbox=[416, 414, 61, 97] --video_path=assets/penguin_in_fog.mp4 --output_path=outputs/penguin_in_fog.mp4
If you wish to try various models please refer to assets\S_Tiny
folder for S-Tiny models and assets\S_Small
folder for S-Small models. We provide a population of models in each for them so that you may choose the best for your sequence. Additionally, if you wish to tune specific hyperparameters for a given dataset, please refer to the code released by Ocean.
Training is done similar to the FEAR Framework. We use GOT-10K, LaSOT, COCO2017, and TrackingNet train sets for training. As explained in FEAR framework, you could create CSV annotation file for each of training datasets.
The annotation file for each dataset should have the following format:
sequence_id: str
- unique identifier of video filetrack_id: str
- unique identifier of scene inside video fileframe_index: int
- index of frame inside videoimg_path: str
- location of frame image relative to root folder with all datasetsbbox: Tuple[int, int, int, int]
- bounding box of object in a formatx, y, w, h
frame_shape: Tuple[int, int]
- width and height of imagedataset: str
- label to identify dataset (example:got10k
)presence: int
- presence of the object (example,0/1
)near_corner: int
- is bounding box touches borders of the image (example,0/1
)
Place all of your csv files in a folder with name train_csv
.
We could not provide CSV annotations as some datasets have license restriction; however, please email the corresponding author and we will gladly provide those files. Alternatively, you could use the code provided in core/dataset_utils
to create those csv files for each of those datasets before starting the training.
Please modify the following Config file: core/config/dataset/full_train.yaml
and set visual_object_tracking_datasets
to where the train_csv
directory is stored. Now, you are one step away from training the network which is:
CUDA_VISIBLE_DEVICES=<> python3 train.py
@inproceedings{zaveri2025siamabc,
title={Improving Accuracy and Generalization for Efficient Visual Tracking},
author={Zaveri, Ram and Patel, Shivang and Gu, Yu and Doretto, Gianfranco},
booktitle={Winter Conference on Applications of Computer Vision},
year={2025},
organization={IEEE/CVF}
}
- We thank FEAR for the base code and pysot-toolkit for making the evaluation kit for object trackers.