Automatic association of chats and video tracks for activity learning and recognition in aerial video surveillance

Riad I Hammoud; Cem S Sahin; Erik P Blasch; Bradley J Rhodes; Tao Wang

doi:10.3390/s141019843

Automatic association of chats and video tracks for activity learning and recognition in aerial video surveillance

Sensors (Basel). 2014 Oct 22;14(10):19843-60. doi: 10.3390/s141019843.

Authors

Riad I Hammoud¹, Cem S Sahin², Erik P Blasch³, Bradley J Rhodes⁴, Tao Wang⁵

Affiliations

¹ BAE Systems, Burlington, MA 01803, USA. riad.hammoud@baesystems.com.
² BAE Systems, Burlington, MA 01803, USA. cem.sahin@baesystems.com.
³ Air Force Research Lab, Rome, NY 13441, USA. erik.blasch.1@us.af.mil.
⁴ BAE Systems, Burlington, MA 01803, USA. brad.rhodes@baesystems.com.
⁵ BAE Systems, Burlington, MA 01803, USA. tao.wang@baesystems.com.

Abstract

We describe two advanced video analysis techniques, including video-indexed by voice annotations (VIVA) and multi-media indexing and explorer (MINER). VIVA utilizes analyst call-outs (ACOs) in the form of chat messages (voice-to-text) to associate labels with video target tracks, to designate spatial-temporal activity boundaries and to augment video tracking in challenging scenarios. Challenging scenarios include low-resolution sensors, moving targets and target trajectories obscured by natural and man-made clutter. MINER includes: (1) a fusion of graphical track and text data using probabilistic methods; (2) an activity pattern learning framework to support querying an index of activities of interest (AOIs) and targets of interest (TOIs) by movement type and geolocation; and (3) a user interface to support streaming multi-intelligence data processing. We also present an activity pattern learning framework that uses the multi-source associated data as training to index a large archive of full-motion videos (FMV). VIVA and MINER examples are demonstrated for wide aerial/overhead imagery over common data sets affording an improvement in tracking from video data alone, leading to 84% detection with modest misdetection/false alarm results due to the complexity of the scenario. The novel use of ACOs and chat Sensors 2014, 14 19844 messages in video tracking paves the way for user interaction, correction and preparation of situation awareness reports.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Artificial Intelligence
Humans
Image Interpretation, Computer-Assisted*
Learning
Pattern Recognition, Automated*
Video Recording*