Computer Vision

The raw data gathered from the experimental work consists of three sequences of images, one from each camera, for each flocking/swarming event. Each sequence has to be analyzed with various computer vision methods and algorithms to extract the required information, namely the 3D position of each individual within the aggregation.

Data of this kind are a very good benchmark for 3D tracking because they are particularly hard to track: they are characterized by a low spatial resolution such that animals appear as objects without any recognizable feature and hence ruling out from the outset the use of any feature–based tracking method, and by frequent optical occlusions, which occur when two or more targets get close in the 3D space or in the 2D space of one or more cameras. Individual targets are not distinguishable anymore and the identities of the occluded objects are mixed  for several frames. Occlusions introduce ambiguities which can result in fragmented trajectories or identity switches, depending on the tracking approach used.

Broadly speaking, the post-processing of the images can be split into two steps: image segmentation; tracking and 3D reconstruction.

Image Segmentation

The goal of this processing is to separate the individual organisms within an aggregation from the rest of the scene. In other words, determine what is background highlighting the objects of interest. For starlings, the background consists of clouds and sky with occasional intruders such as airplanes and other birds. For midges, the background is more complicated as the typical setting is in a park with trees, shrubs and grass, and intruders consisting of unidentified flying objects (pollen, seeds, other large insects etc.) and people. Our image segmentation procedure uses a combination of well known image processing methods such as background subtraction, thresholding, morphological operations, and flood-fill/watershed.

Tracking and 3D Trajectory Reconstruction

The results from the image segmentation procedure is frame-by-frame listing of each identified object’s barycenter location within each camera. The challenge now is to track the individuals matching object identities both in space (across the cameras) and in time.  3D tracking algorithms are classified as tracking-reconstruction (TR) and reconstruction-tracking (RT) algorithms. In the TR approach, each object is first tracked in the 2D space on each camera, and then these 2D tracks are matched across cameras to reconstruct the three dimensional trajectories. In the RT approach, instead, objects are first matched across cameras, reconstructing their 3D position, and then they are directly tracked in the 3D space.

GReTA – Global and Recursive Tracking Algorithm

Our current 3D tracking method, GReTA, belongs to the tracking-reconstruction class. It has been developed to be robust in the case of severe occlusions. To ensure robustness, we adopt a global optimization approach that works on all objects and frames at once. To achieve practicality and scalability, we employ a divide and conquer strategy, thanks to which the computational complexity of the problem is reduced by orders of magnitude.

We tested our algorithm on experimental data of bird flocks and insect swarms, (see Videos below, best view in Full screen).

GReTA was also tested on two public multi-view multi-object tracking benchmarks on bats proposed by Z.Wu, showing that GReTA outperforms the already existing tracking methods, producing high-quality results with negligible identity switches. More details about GReTA can be found here.

The main limit of the GReTA (due to the TR approach) is the computational resources needed, because the number of 2D tracks created in each camera grows exponentially with the number of occlusions and the time duration of the experiment. To overcome this limitation, we are now working at a novel 3D tracking method, SpaRTA – Spatiotemporal Reconstruction Tracking Algorithm, belonging to the reconstruction-tracking class. SpaRTA solves optical occlusions directly working in the 3D space and it overcomes identity ambiguities using information on the volumes occupied by the objects, namely reconstructing the 3D objects as dense clouds of 3D points. More details about preliminary version of SpaRTA can be found here.