Video as a Sensor

Felipe Felix Arias, Daniel Carmody, Richard Sowers,
Jayati Singh, Kevin So
University of Illinois at Urbana-Champaign


We propose a methodology for identifying scenes and compound objects such as cyclists (person + bicycle) to track human behavior in the roadways, assess risk, and detect areas that may need more law enforcement or traffic signs.



Tracking Cyclists

As a case study, we develop a method for tracking cyclists. While most object detection frameworks fail to label cyclists when the bicycle or person leave the frame, we use a graphical model that assigns probability values to an object's presumed past detections by storing and processing information from previous frames. Below is an enhance version the You Only Look Once object detection neural network (left) and the same detections being passed as inputs to our model (right).


Detecting Risk and Weak Supervision

We developed a weak supervision pipeline that generates labels for training data through computer vision, pose, and cross-temporal heuristics and one that assigns risk to object detections based on risk metrics. We used naïve and advanced heuristics for risk such as the object’s relative position and time to collision to generate a risk percentage that combines all heuristics. Through a matrix completion procedure that tracks metrics such as how often the heuristics agree and which one can be trusted most often, the probabilistic labels can be used to train neural networks that outperform those that use majority vote. The video at the top of this page shows our jaywalker detector and below are what our model believed to be the riskiest person detections in the validation set of the Waymo Open Dataset.

Website adapted from Unnat Jain, Jingxiang Lin, Richard Zhang and Deepak Pathak.