Multi-Modal Detection & Classification Pipeline
2020
This project started in 2020.
This project contains several examples of using the TrackEverything package, which is an open-source framework for detection and classification workflows. The examples cover proximity, occlusion, and live video classification scenarios using detection models, tracking algorithms, and statistical decision logic.
Overview
You can find all the models and test videos here.
Example 1
The Detection Model
This example uses an Head Detection model from AVAuco/ssd_head_keras github repository for detecting heads, I modified the files to be compatible with TF2.2. The model has been trained using the Hollywood Heads dataset as positive samples, and a subsample of the EgoHands dataset as negative
samples. This model has been developed using Pierluigi Ferarri’s Keras implementation of SSD as primary source, and replicates the original Matconvnet version of our model. In the custom_get_detection_array I use the model to give me all the heads detected in a frame with a score of at least detection_threshold=0.4. Later I filter out redundant overlapping detections using the default Non-maximum Suppression (NMS) method.

The Classification Model
After we have the heads from the detection model, I put them through a classification model to determine the probability of the target condition. I used a classification model from chandrikadeb7/Face-Mask-Detection github repository. It’s based on the MobileNetV2 architecture, and it is computationally efficient for prototyping on embedded hardware such as Raspberry Pi or Google Coral.

Results for Example 1
I only tested it on one video I found online but the results are fair and setting could be optimized much more. The head detection is very rudimentary and has a lot of misses and partial matches.
Example 2
The Detection Model
This example uses a Face Detection model from OpenCV for detecting faces. OpenCV ships out-of-the-box with pre-trained Haar cascades that can be used for face detection and a deep learning-based face detector that has been part of OpenCV since OpenCV 3.3. In the custom_get_detection_array I use OpenCV to give me all the faces detected in a frame with a score of at least detection_threshold=0.12. Later I filter out redundant overlapping detections using the default Non-maximum Suppression (NMS) method.
The Classification Model
I used the same classification model as in example 1.
Results for Example 2
The results are fair and better from example 1, mainly since the face detector is better. The classification model is not very good and has a lot of misses, but optimizing the detector’s parameters can make better results.
Example 3
The Detection Model
This example uses a detection and classification model from PureHing/face-mask-detection-tf2 github repository for face and head detection. The lightweight SSD model uses a MobileNet/RFB backbone and provides classification output, so a second model is not required. In the detection_vars.py I use the model to detect heads with a score of at least DETECTION_THRESHOLD=0.4 and later filter overlapping detections using Non-maximum Suppression. I also receive classification scores from the model and pass them into the detector.

Results for Example 3
I tested it on the same video I found online and the results are very good and the best so far. Changing the setting could help for receiving even greater results.
- Programmed in Python.
- Over 3.3K lines of code.This figure may include comment lines and some modified library files.
-
TrackEverything,
TensorFlow,
OpenCV,
NumPy, Pillow & SciPy used in Python.
| Lang/Lib/Pro | Version |
|---|---|
| Python | 3.8.1 |
| TrackEverything | 1.7.2 |
| TensorFlow | 2.2.0 |
| OpenCV | 4.2.0.34 |
| NumPy | 1.18.4 |
| Pillow | 7.1.2 |
| SciPy | 1.4.1 |
| Type | Python Scripts |
| Input | Camera/Video Feed |
| Output | Persons in Frame Classification |
| Special Components | Camera |
Example
Example
Example