in_jour_smart_sensing_and_intelligent_systemsexeleyInternational Journal on Smart Sensing and Intelligent Systems1178-5608Exeley Inc.10.21307/ijssis-2020-001exeleyHealth Care Sciences & ServicesRobust single target tracking using determinantal point process observationsRobust single target tracking using determinantal point process observationsHernándezS.*SallisP.Laboratorio de Procesamiento de Información Geoespacial, Universidad Católica del Maule, Av. San Miguel, 3605, Talca, ChileAuckland University of Technology, Auckland, New Zealand*E-mail: shernandez@ucm.cl
The efficiency and robustness of modern visual tracking systems are largely dependent on the object detection system at hand. Bernoulli and Multi-Bernoulli filters have been proposed for visual tracking without explicit detections (image observations). However, these previous approaches do not fully exploit discriminative features for tracking. In this paper, we propose a novel Bernoulli filter with determinantal point processes observations. The proposed observation model can select groups of detections with high detection scores and low correlation among the observed features; thus achieving a robust filter.
Visual trackingBernoulli filter.
<p>Visual tracking is a challenging computer vision task with applications in human-computer interaction, video surveillance and crowd monitoring among others. Modern visual tracking systems may use complex object detection schemes for estimating the current state of a target in any particular video frame. However, this approach does not fully exploit the temporal structure of the estimation problem. Visual tracking can be also thought of as a dynamic model with observed features and latent states representing the position/velocity of an object (<xref ref-type="bibr" rid="ref016">Maggio and Cavallaro, 2011</xref>). In this context, the generative model for visual tracking requires not only the correct specification of the model and its parameters but also the ability to capture the variations of the system (<xref ref-type="bibr" rid="ref026">Wang et al., 2015</xref>).</p>
<p>The Bernoulli filter is a powerful algorithm that allows objects to appear and disappear, using extracted features from the image as observations (<xref ref-type="bibr" rid="ref025">Vo et al., 2010</xref>). Similar approaches for visual tracking have been proposed (also known in the literature as Track-Before-Detect). Nevertheless, these methods rely on unreliable background subtraction operations or the likelihood function being in a <italic>separable</italic> form (<xref ref-type="bibr" rid="ref007 ref006">Hoseinnezhad et al., 2012, 2013</xref>).</p>
<p>Current state-of-the-art trackers are based on either correlation filters (<xref ref-type="bibr" rid="ref001">Bolme et al., 2010</xref>), deformable parts models (<xref ref-type="bibr" rid="ref004">Hare et al., 2016</xref>) or convolutional neural networks (<xref ref-type="bibr" rid="ref015">Li et al., 2018</xref>). These trackers learn a discriminative model from a single frame and then update the model using new frames. Furthermore, tracking performance can be increased when using more discriminative features such as HOG (<xref ref-type="bibr" rid="ref005">Henriques et al., 2015</xref>; <xref ref-type="bibr" rid="ref024">Solis Montero et al., 2015</xref>; <xref ref-type="bibr" rid="ref027">Xu et al., 2019</xref>). On the other hand, even when Bernoulli filters have demonstrated being useful models for tracking in complex scenarios, it is still hard to rely on such features for increasing their performance.</p>
<sec id="sec1-1">
<title>Related works
The Bernoulli filter is a specialized version of the PHD filter (Mahler, 2003), with the focus on single target tracking. While the original PHD filter is based on a Poisson point process, several extensions have been proposed to cope with non-Poisson distributions. In particular, the Cardinalized PHD filter allows estimating the number of targets using arbitrary distributions and provides improved estimates (Mahler, 2007). The multi-Bernoulli and Poisson multi-Bernoulli mixture filters also allow to approximate the cardinality distribution and become especially well suited when the mean of the multi-target posterior is higher than the variance (García-Fernández et al., 2018). All of these methods rely on first-order or second-order moments but assume that targets behave independently with each other. Therefore, the authors in Privault and Teoh (2019) propose a second-order filter that accounts for interaction between the targets. The method is based on determinantal point processes (DPP) that take into consideration the correlation among the targets through a kernel function. In Jorquera et al. (2017), the authors propose a determinantal point process for pruning the components of the Gaussian mixture PHD filter. More recently, the authors in Jorquera et al. (2019) compared the PHD filter using determinantal point process observations with other methods for visual multi-target tracking.
The contributions of this paper are twofold. First, the third section provides introductory notions of the Bernoulli filter and then we derive a novel Bernoulli filter using determinantal point process observations (B-DPP filter) for single target tracking in the fourth section. Second, in the fifth section we derive a Sequential Monte Carlo implementation of the B-DPP filter using a truncated likelihood, which can outperform other discriminative trackers in several scenarios.
Point processes for visual object tracking
A point process is a random pattern of points in a possibly multi-dimensional space (Kingman, 1993). A simple point process can be defined in one dimension, which is usually times and can be used to describe the random times where the events can occur with no coincident points.
Bernoulli point process
The problem of performing joint detection and estimation of multiple objects has a natural interpretation as a dynamic point process, where the stochastic intensity of the model is a space-time function λ(x), where x ∈ℝ^{d} denotes the state space of the target. If we let B = B_{1}∪ B_{2} ∪ … ∪ B_{k} represent the union of disjoint video frames B_{i}, the corresponding number of objects on each image can be written as N(B_{1}), N(B_{2}), …, N(B_{k}). The Bernoulli point process for a single object that can randomly appear or disappear takes the form:p(N(B1)=n1,…,N(Bk)=nk)=n!n1!…nk!∏ik(λ(xi)Λ(B))ni,=n!n1!…nk!∏ikp(xi)ni,where n_{i} can take either 1 or 0, n = ∑n_{i} and Λ(B) = ∫_{B}λ(x)dx. Every subset B_{i} can take at most one target x with probability q, therefore we can characterize the distribution of the point process X = {x} using the following relationship:(1)p(X)={1−qifX=∅qp(x)ifX={x}.(1)
Determinantal point process
In recent years, deep learning approaches have demonstrated outstanding performance in several visual tracking benchmarks (Kristan et al., 2019). These trackers are mostly based on extracted features from a convolutional neural network and an objective loss that minimizes a localization error (Li et al., 2018). However, the detection process is not perfect and false positives and negatives are to be encountered after ranking the top proposals from the convolutional features.
In order to develop an stochastic approach for the single-object observation model, a discrete DPP can be used to capture probabilistic relationships using a kernel matrix K:�×�↦�K : Z _ × Z _ ↦ ℝ that measures the similarity among different detections (Lee et al., 2016). Therefore, instead of considering independent detections in a particular frame, the DPP likelihood specifies the joint probability over all 2^{n} subsets of �Z _ with distribution:(2)p(Z⊂�)=det(KZ),∀Z⊂�,p(Z ⊂ Z _ ) = det (Kz), ∀Z ⊂ Z _ (2)where Z is a random subset of �Z _ and K_{Z} ≡ [K_{i,j}] for all i,j∈�i, j ∈Z _ . Furthermore, the product density can also be written in terms of a positive definite matrix L = K(I − K)^{−1}, such that the probability mass function of Z can be written as:(3)p(Z)=det(LZ)det(I+L),(3)where I is the identity matrix and L_{Z} is a sub-matrix of L indexed by the elements of Z.
Bernoulli filter
In this case, a model for detection and estimation of multiple objects can be achieved by the conditional expectation of the posterior point process (random finite set) under transformations (Ristic et al., 2013).
Let X_{k} = {x} be a Bernoulli point process and Z_{k} = {z_{1}, z_{2}, …, z_{m}} a DPP observed from frame K. The result from superposition, translation and thinning transformations is also a Bernoulli point process X_{k} ∼ p(X_{k}|X_{k−1}) (Kingman, 1993). The predicted point process can be written as the linear superposition of a π_{s} thinned point process with Markov translation f(x|x′) and a π_{b} Bernoulli birth process. The predicted expected number of targets N_{k|k−1} for a single target with probability of survival π_{s}(x) and spontaneous birth can be written as:(4)Nk|k−1=Nk|k−1s+Nk|k−1b,(4)where:Nk|k−1s=πs∫f(x|x′)pk−1|k−1({x′})dx,Nk|k−1b=πb∫pb(x|∅)pk−1|k−1(∅)dx.
The filtering density of a Bernoulli point process is completely specified by the pair (p_{k|k−1}, q_{k|k−1}), which is obtained by:(5)pk|k−1(X′)={1−qk−1|k−1ifX′=∅qk−1|k−1if|X′|=1.(5)
Using Equation (5), the probability of existence q_{k|k−1} can be written as:(6)qk|k−1=πb(1−qk−1|k−1)+πsqk−1|k−1.(6)
And the probability of the predicted Bernoulli point process:(7)qk|k−1pk|k−1({x})=πb(1−qk−1|k−1)pb(x)+πsqk−1|k−1∫f(x|x′)pk−1|k−1({x′})dx′a.(7)
If we let Z_{k} be the observations that contain both false detections and target originated measurements, the update equation considers the probability of observing the target with probability of detection π_{d} under clutter (e.g. false positives). From (Mahler, 2003, 2007), the multi-target likelihood function for the standard measurement model (Poisson distributed clutter with density κ_{p}(Z_{k}) = e^{−λ
}∏_{i}λf_{c}(z_{i}) and Bernoulli probability of detection π_{d}) can be written as:(8)p(Zk|Xk)=κp(Zk)(1−πd)|Xk|∑σ∏iπdp(zσi|xi)(1−πd)λfc(zσi).(8)
The likelihood term in Equation (8) considers all possible locations and location-to-track associations σ, so most of the terms will be canceled. The likelihood term becomes:(9)p(Zk|{x})=κp(Zk)(1−πd)+πd∑z∈Zk∏ip(zi|x)λfc(zi).(9)
The Bayes update equation takes the form:(10)p(Xk|Zk)=p(Zk|Xk)p(Xk|Z1:k−1)p(Zk|Z1:k−1).(10)
The denominator of Equation (10) can be written as:(11)p(Zk|Z1:k−1)=fc(Zk){1−qk|k−1+qk|k−1(1−πd)Mk+∑Z∈Zkψk∫∏ip(zi|x)pk|k−1(x)dx∏jfc(zj)},(11)where:ψk=Mk!(Mk−|Z|)!πd|Z|(1−πd)|Z|−Mk.
The updated binomial point process can be derived as follows:(12)qk|k=1−Δk1−qk|k−1Δkqk|k−1,(12)where:(13)Δk=1−(1−πd)Mk−∑z∈Zkψk∫∏ip(zi|x)pk|k−1(x)dx∏jλfc(zj),(13)and(14)pk|k(x)=(1−πd)Mk+∑z∈Zkψk∫∏ip(zi|x)pk|k−1(x)dx∏jfc(zj)1−Δk.(14)
Determinantal filter
Let X_{k} = {x} be a Bernoulli point process and Z_{k} = {z_{1}, z_{2},…, z_{m}} a DPP observed at frame K. The result from superposition, translation and thinning transformations is also a Bernoulli point process X_{k} ∼ p(X_{k}|X_{k−1}) (Ristic et al., 2013). The predicted point process can be written as the linear superposition of a π_{s} thinned point process with Markov translation f(x|x′) and a π_{b} Bernoulli birth process. In order to measure the quality of the observations, we must introduce a random variable L such that p(L|Z) ∝ det(L(Z)), where L(Z) is a positive definite kernel matrix that depends on the observed features Z. The L(Z) kernel can be written as a Gram matrix:(15)Lij(Z)=gx(zi)ϕ(zi)Tϕ(zj)gx(zj),(15)(16)=gx(zi)Sij(Z)gx(zj).(16)
The function g_{x}(z_{i}) = ∑_{c}p(z_{i}|c)p(c|x) is used to model the quality of the item z_{i} and S(Z) the diversity of the set Z. If we let W be a subset of detections arising from the target (Reuter et al., 2013):(17)η(W|{x})≈{(1−πd)ifW=∅πdgx(w1)det(Sw1)if|W|=1|W|!πdm∏igx2(wi)det(SW)if|W|=m.(17)
The DPP Z can be treated as the union of two independent sets Z = C ∪ W, where C = {c_{1}, …, c_{m}} represents clutter. The clutter density becomes:(18)κd(C)=|C|!∏ifc2(ci)det(SC).(18)
The likelihood function for the standard measurement model using determinantal observations becomes:(19)P(Z|{x})=∑W⊆Zη(W|{x})κd(Z\W),(19)(20)p(Zk|{x})=κd(Zk)[(1−πd)Mk+∑Z∈Zk|Z|!(Mk−|Z|)!Mk!πd|Z|(1−πd)Mk−|Z|∏i[gx(zi)fc(zi)]2det(SZ)det(SZk\Z)det(SZk)].(20)
Now, we want to derive the posterior distribution for Bernoulli point process given DPP observations:(21)p(Zk)=κd(Zk)[(1−qk|k−1)+qk|k−1(1−πd)Mk+∑Z∈ZkΞk∫∏igx2(zi)pk|k−1(x)dx∏jfc2(zj)],(21)where:Ξk=|Z|!(Mk−|Z|)!Mk!πd|Z|(1−πd)|Z|−Mkdet(SZ)det(SZk\Z)det(SZk).
The updated binomial point process can now be derived as follows:(22)qk|k=1−Δ˘k1−qk|k−1Δ˘kqk|k−1,(22)where:(23)Δ˘k=1−(1−πd)Mk−∑z∈ZkΞk∫∏ip(zi|x)pk|k−1(x)dx∏jfc(zj),(23)and:(24)pk|k(x)=(1−πd)Mk+∑z∈ZkΞk∫∏igx(zi)pk|k−1(x)dx∏jfc(zj)1−Δ˘k.(24)
Approximated Bernoulli determinantal filter
In practice, it is difficult to store and compute the power set with all possible configurations of Z_{k} in the likelihood term (see Equation (20)). An approximation can be constructed by truncating the likelihood and focusing only on the more likely elements. Let Z_{k}^{*} = arg max_{Z⊂ Zk}η(Z|{x}) be a subset of Z_{k} whose elements are detections arising from the target. The likelihood becomes:(25)p(Zk*|{x})=|Zk*|!∏igx2(zi)det(Zk*).(25)
DPPs have been proposed in the literature as an alternative to other object refinement techniques such as non-maximum suppression (Lee et al., 2016). These methods operate over object proposals and eliminate redundant detections. For DPPs, mode finding can be tackled using the following greedy algorithm (Kulesza and Taskar, 2011):
Conversely, by using the truncated likelihood from Equation (25), the Sequential Monte Carlo algorithm for the Bernoulli filter can be used to estimate the single-target posterior (Ristic, 2013).
Experimental results
In order to demonstrate the advantages of the proposed model updating approach over other discriminative approaches, we evaluate the tracking results on six challenging video sequences from the Visual Object Challenge 2014 (VOT) data set^{1}. The proposed SMC implementation uses local binary patterns (LBP) as observed features and a simple observation model p(zi|c)∝exp(−Dk22σo2), with D_{k} = dist[z_{c},z_{k}] and z_{c} being a reference LBP histogram (Czyz et al., 2007). The state x_{k} is configured as a 4-dimensional rectangle including the left-most position, width and height of the target. The dynamic model uses a random walk and the parameters of the model are held fixed for all sequences. The B-DPP filter is implemented in the C++ language using the OpenCV library. The parameters for the B-DPP filter are determined empirically and shown in Table 1.
Particle Bernoulli-DPP filter.
Particle Bernoulli-DPP filter
Number of particles N
100
Uniform birth probability (π_{b})
0.1
Uniform survival probability (π_{s})
0.99
Newborn particles (N_{b})
0
Standard deviation for observation model (σ_{o})
20.4
Covariance matrix for dynamic model (σ_{x} × 1)
3.0 × 1
The parameter setting the Greedy Mode Finding algorithm is described in Table 2.
Greedy mode finding.
Greedy mode finding
Acceptance ratio ε
0.7
The sequence jogging is a challenging example containing full occlusions, rotations and background clutter. Figure 1 shows one frame of the sequence and the estimates using the proposed approach and other state-of-the-art methods.
Frame 85 of the jogging sequence. At each frame, a greedy mode finding step is performed using Algorithm 1. Rectangles represent ground-truth, state estimates and DPP observations.
The Bernoulli DPP filter maintains a balance between the observed features and the quality of the observations (see Figure 1). The observation model uses a simple histogram comparison and no template update is performed, so the model is not robust to object deformation or rotation. Even that, as seen in Figure 1 the Bernoulli-DPP tracker achieves good performance in cases such as full occlusion where the other discriminative tracking methods fail. Performance is measured using widely used precision and success metrics^{2}.
The precision metric describes the percentage of frames whose center location error is below a given threshold. Table 3 shows the overall precision metric averaged over all sequences on five different runs for each one of the algorithms.
Average precision (th = 20).
Sequence
DPP
KCF
sKCF
Struck
Ball
0.309
0.289
0.246
0.372
Bolt
0.083
0.017
0.017
0.026
Diving
0.073
0.082
0.087
0.091
Gymnastics
0.710
0.425
0.425
0.435
Jogging
0.707
0.231
0.231
0.228
Polarbear
0.946
0.857
0.916
0.844
th = threshold.
The success measure accounts for bounding box overlap. Table 4 shows the number of success frames whose overlap is above some threshold, averaged over the sequences on five different runs. Quantitative analysis shows improved performance for the proposed approach when compared to the discriminative trackers in six different video sequences.
Average success (th = 0.5).
Sequence
DPP
KCF
sKCF
Struck
Ball
0.206
0.211
0.201
0.128
Bolt
0.031
0.011
0.011
0.017
Diving
0.183
0.110
0.114
0.151
Gymnastics
0.560
0.415
0.420
0.425
Jogging
0.205
0.225
0.225
0.225
Polarbear
0.749
0.747
0.760
0.712
th = threshold.
Figure 2 shows the precision metric against the location error threshold for all of the six tested sequences. The red line indicates the best performing method among the four different algorithms. Since the bolt and jogging sequences have background clutters (the background near the target has similar appearance as the target), the proposed Bernoulli DPP tracker reduces redundant observations and improves precision.
Overall precision plots for the visual tracking sequences.
Figure 3 shows the ratio of the frames whose tracked box has more overlap with the ground-truth box than a threshold. The success metric can be associated with the tracker algorithm ability to maintain long-term tracks. Since the Bernoulli DPP filter accounts for missed detections, the proposed approach improves the area under the curve of the success metric in 67% of the tested sequences.
Overall success plots for the visual tracking sequences.
Conclusions
In this paper, a novel algorithm for joint detection and tracking a single object in video has been presented. The proposed approach takes into account the detection score and the similarity of the observed features. Then, a Bayesian filter using a Bernoulli point process estimates the state of the target from a diverse subset of object proposals. Experimental evaluations show that the results are comparable to other state-of-the-art techniques for visual tracking in only 6 of the 25 sequences of the data set. In this paper, we only considered a simple observation model (distance to a reference LBP histogram), which might hinder the performance of this approach in the overall data set. This observation model is not robust to scale and rotation changes and no model updating strategies are considered in this paper. Nevertheless, our model is expected to increase its performance when using a more complex observation model (such as deep learning features), model updating and ensemble post-processing techniques for combining the output from different tracking schemes.
This work was supported by CONICYT/FONDECYT grant, project Robust Multi-Target Tracking using Discrete Visual Features, code 11140598.
Bolme, D. S., Beveridge, J. R., Draper, B. A. and Lui, Y. M.2010. Visual object tracking using adaptive correlation filters. 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 2544–2550.Czyz, J., Ristic, B. and Macq, B.2007. A particle filter for joint detection and tracking of color objects. García-Fernández, A. F., Williams, J. L., Granström, K. and Svensson, L.2018. Poisson multi-bernoulli mixture filter: direct derivation and implementation. Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.-M., Hicks, S. L. and Torr, P. H.2016. Struck: structured output tracking with kernels. Henriques, J. F., Caseiro, R., Martins, P. and Batista, J.2015. High-speed tracking with kernelized correlation filters. Hoseinnezhad, R., Vo, B. N. and Vo, B. T.2013. Visual tracking in background subtracted image sequences via multi-bernoulli filtering. Hoseinnezhad, R., Vo, B.-N., Vo, B.-T. and Suter, D.2012. Visual tracking of numerous targets via multibernoulli filtering of image data. Jorquera, F., Hernández, S. and Vergara, D.2017. Multi target tracking using determinantal point processes. in Mendoza, M. and Velastin, S. (Eds), Jorquera, F., Hernández, S. and Vergara, D.2019. Probability hypothesis density filter using determinantal point processes for multi object tracking. Kingman, J. F. C.1993. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Zajc, L. Č., Vojír, T., Bhat, G., Lukežič, A., Eldesokey, A., Fernández, G., García-Martín, Á., Iglesias-Arias, Á., Alatan, A. A., González-García, A., Petrosino, A., Memarmoghadam, A., Vedaldi, A., Muhič, A., He, A., Smeulders, A., Perera, A. G., Li, B., Chen, B., Kim, C., Xu, C., Xiong, C., Tian, C., Luo, C., Sun, C., Hao, C., Kim, D., Mishra, D., Chen, D., Wang, D., Wee, D., Gavves, E., Gundogdu, E., Velasco-Salido, E., Khan, F. S., Yang, F., Zhao, F., Li, F., Battistone, F., De Ath, G., Subrahmanyam, G. R. K. S., Bastos, G., Ling, H., Galoogahi, H. K., Lee, H., Li, H., Zhao, H., Fan, H., Zhang, H., Possegger, H., Li, H., Lu, H., Zhi, H., Li, H., Lee, H., Chang, H. J., Drummond, I., Valmadre, J., Martin, J. S., Chahl, J., Choi, J. Y., Li, J., Wang, J., Qi, J., Sung, J., Johnander, J., Henriques, J., Choi, J., van de Weijer, J., Herranz, J. R., Martínez, J. M., Kittler, J., Zhuang, J., Gao, J., Grm, K., Zhang, L., Wang, L., Yang, L., Rout, L., Si, L., Bertinetto, L., Chu, L., Che, M., Maresca, M. E., Danelljan, M., Yang, M.-H., Abdelpakey, M., Shehata, M., Kang, M., Lee, N., Wang, N., Miksik, O., Moallem, P., Vicente-Moñivar, P., Senna, P., Li, P., Torr, P., Raju, P. M., Ruihe, Q., Wang, Q., Zhou, Q., Guo, Q., Martín-Nieto, R., Gorthi, R. K., Tao, R., Bowden, R., Everson, R., Wang, R., Yun, S., Choi, S., Vivas, S., Bai, S., Huang, S., Wu, S., Hadfield, S., Wang, S., Golodetz, S., Ming, T., Xu, T., Zhang, T., Fischer, T., Santopietro, V., Štruc, V., Wei, W., Zuo, W., Feng, W., Wu, W., Zou, W., Hu, W., Zhou, W., Zeng, W., Zhang, X., Wu, X., Wu, X.-J., Tian, X., Li, Y., Lu, Y., Law, Y. W., Wu, Y., Demiris, Y., Yang, Y., Jiao, Y., Li, Y., Zhang, Y., Sun, Y., Zhang, Z., Zhu, Z., Feng, Z.-H., Wang, Z. and He, Z.2019. The sixth visual object tracking vot2018 challenge results. in Leal-Taixé, L. and Roth, S. (Eds), Kulesza, A. and Taskar, B.2011. Learning determinantal point processes. Proceedings of the Twenty-Seventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-11), AUAI Press, Corvallis, OR, pp. 419–427.Lee, D., Cha, G., Yang, M.-H. and Oh, S.2016. Individualness and determinantal point processes for pedestrian detection. in Leibe, B., Matas, J., Sebe, N. and Welling, M. (Eds), Li, B., Yan, J., Wu, W., Zhu, Z. and Hu, X.2018. High performance visual tracking with siamese region proposal network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980.Li, P., Wang, D., Wang, L. and Lu, H.2018. Deep visual tracking: review and experimental comparison. Maggio, E. and Cavallaro, A.2011. Mahler, R.2007. Phd filters of higher order in target number. Mahler, R. P. S.2003. Multitarget bayes filtering via first-order multitarget moments. Mahler, R. P. S.2007. Privault, N. and Teoh, T.2019. Second order multi-object filtering with target interaction using determinantal point processes. Tech. Rep. arXiv:1906.06522 [math.PR], ArXiV, June.Reuter, S., Wilking, B., Wiest, J., Munz, M. and Dietmayer, K.2013. Real-time multi-object tracking using random finite sets. Ristic, B.2013. Ristic, B., Vo, B. T., Vo, B. N. and Farina, A.2013. A tutorial on bernoulli filters: Theory, implementation and applications. Solis Montero, A., Lang, J. and Laganiere, R.2015. Scalable kernel correlation filter with sparse feature integration. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, December, pp. 587–594.Vo, B.-N., Vo, B.-T., Pham, N.-T. and Suter, D.2010. Joint detection and estimation of multiple objects from image observations. Wang, N., Shi, J., Yeung, D.-Y. and Jia, J.2015. Understanding and diagnosing visual tracking systems. The IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, December, pp. 3101–3109, doi: 10.1109/ICCV.2015.355.Xu, T., Feng, Z.-H., Wu, X.-J. and Kittler, J.2019. Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking.