Skip to main content
Dataset Modality Annotated Description Potential Capability/Uses Access Source
AI City Challenge 2021 HD Color Video Yes 5 datasets corresponding to vehicle counting, vehicle re-id, vehicle tracking, traffic anomaly detection, and natural language-based vehicle retrieval. Time-synchronized video feeds from several traffic cameras spanning major travel arteries of the city. Most of these feeds are high resolution 1080p feeds at 10 fps. The vantage point of these cameras is for traffic and transportation purposes, faces and license plates have been redacted. The data set has been expanded to include natural language (NL) descriptions: 3 human annotations per vehicle track for use in the training, validation, and testing. Each NL annotation provides a description of the vehicle to track/retrieve, including attributes like vehicle type and color, motions like turns, and relationships to other vehicles in the scene. Multi-Class Vehicle Counting, Multi-Camera Vehicle Re-ID, Multi-Camera Vehicle Tracking, Traffic Anomaly Detection, Natural Language-Based Vehicle Retrieval Public upon Request https://www.aicitychallenge.org/2021-data-and-evaluation/
ALERT Airport Re-ID Dataset Color Video Yes Video data from the six cameras installed post central security checkpoint at an active commercial airport within the United States. Each of the videos were temporally segmented into 40 short clips, each with a duration of about 5 minutes. All the people appearing in each of these short clips were automatically detected and tracked using existing person detection and tracking algorithms. Subsequently, all the detections and tracks were manually annotated. Person Re-ID, Multi-Target Multi-Camera Person Re-ID, Multi-Target Multi-Camera Person Tracking Public upon Request https://alert.northeastern.edu/transitioning-technology/alert-datasets/alert-airport-re-identification-dataset/
ALERT Automated Threat Recognition Dataset Computed Tomography Yes 188 CT scans of packed luggage, bins, and stream-of-commerce items, with some benign materials (clay, saline, and rubber) to serve as proxies for explosive threat materials. Anomaly Detection, Threat Detection, Material Discrimination, Semantic Segmentation Public upon Request https://alert.northeastern.edu/transitioning-technology/alert-datasets/
ALERT CT Dataset Computed Tomography Yes CT Scans of packed luggage, with threat simulants to serve as proxies for explosive materials. Approximately 900 objects were placed in luggage and scanned to produce 62 luggage datasets to span the spectrum of packing, density, arrangement, orientation, and size difficulty. Anomaly Detection, Threat Detection, Material Discrimination, Semantic Segmentation Public upon Request https://alert.northeastern.edu/transitioning-technology/alert-datasets/
ALERT Reconstruction Initiative Dataset Computed Tomography Yes Projection and image data corresponding to scans of objects of interest in the presence of various amounts of clutter, using a medical CT scanner. Reconstructed images using filtered back-projection that match the images obtained on the medical scanner were used to generate the database. Anomaly Detection, Threat Detection, Material Discrimination, Semantic Segmentation Public upon Request https://alert.northeastern.edu/transitioning-technology/alert-datasets/
Benchmarking IR Dataset for Surveillance with Aerial Intelligence (BIRDSAI) LWIR Imagery Yes A long-wave thermal infrared dataset containing nighttime images of animals and humans in Southern Africa. The dataset allows for benchmarking of algorithms for automatic detection and tracking of humans and animals with both real and synthetic videos. Anomaly Detection, Object Detection, Classification, Multi-Target Person Tracking Public https://sites.google.com/view/elizabethbondi/dataset
CASIA Gait Databases Multiple Modalities Available Yes Four datasets of gait data for multiple subjects. Datasets A and B capture multiple views of subjects relative to the imaging plane, with 20 and 124 subjects respectively. Dataset C includes thermal IR imagery for 153 subjects. Dataset D contains video and foot pressure pad data for 88 subjects walking indoors. Gait Estimation, Gait Recognition Public http://www.cbsr.ia.ac.cn/english/Gait%20Databases.asp
CityPersons Dataset HD Color Imagery Yes Subset of images from the Cityscapes dataset containing humans in city environments. Annotations are available for: pedestrians, people operating vehicles, sitting, holding unusual postures, and groups of people. Person Re-ID, Multi-Target Person Re-ID Public https://github.com/cvgroup-njust/CityPersons
CUHK Color Imagery Yes One of the largest Person Re-ID datasets, with 13,164 images of 1,360 individuals captured across 6 non-overlapping surveillance cameras. Images are both manually labeled and automatically generated bounding boxes for each pedestrian, and are labeled with unique IDs. Person re-ID, Multi-Target Multi-Camera Person Re-ID Public https://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html
DHS Passenger Screening Kaggle Dataset mmW HD-AIT Scans Yes HD mmW scans of airline passengers with varying clothing, BMI, genders with a number of 'threat' simulants distributed acrosss the body. Data types include 3D scans, angular scan sequences, projected scan views, and raw scanner data. Anomaly Detection, Threat Detection Public upon Request https://www.kaggle.com/c/passenger-screening-algorithm-challenge/data
DOTA-v1.5 Overhead HD Color Imagery Yes DOTA-v1.5 contains 0.4 million annotated object instances within 16 categories, which is an updated version of DOTA-v1.0. Both of them use the same aerial images but DOTA-v1.5 has revised and updated the annotation of objects, where many small object instances about or below 10 pixels that were missed in DOTA-v1.0 have been additionally annotated. The categories of DOTA-v1.5 is also extended. The object categories in DOTA-v1.5 include: plane, ship, storage tank, baseball diamond, tennis court, basketball court, ground track field, harbor, bridge, small vehicle, large vehicle, helicopter, roundabout, soccer ball field, swimming pool and container crane. Object Detection, Classification Public https://captain-whu.github.io/DOAI2019/dataset.html
Duke MTMC (Multi-Tracking Multi-Camera) HD Color Video Yes The main dataset contains ID annotations and timestamps for pedestrians on 8 video feeds around Duke campus. Extension dataset MTMC-attribute includes 23 human-level attributes(gender, backpack, shoe color, upper-body clothing color, etc.). Person Re-ID, Human Pose Estimation, Multi-Target Multi-Camera Person Re-ID, Multi-Target Multi-Camera Person Tracking Public via secondhand sources  
EPFL Multi-Camera HD Color Video Yes HD Video sequences from synchronized camera streams capturing a common scene with frame overlap. 5 different scenes are documented, with varying lighting and clutter, but all video sequences are taken from 2m above the ground. Person Re-ID, Multi-Camera Multi-Target Person Re-ID, Multi-Target Multi-Camera Person Tracking Public https://www.epfl.ch/labs/cvlab/data/data-pom-index-php/
Face Recognition Databases listed at face-rec.org Multiple Modalities Available Yes Exhaustive collection of face datasets most commonly used. Collection includes facial video datasets from surveillance cameras, multi-camera recordings of facial expressions and speech, mugshot identification datasets, facial feature and landmark annotations, etc. Facial Recognition, Person Re-ID, Person Tracking Public https://www.face-rec.org/databases/
FLIR Thermal Dataset for Algorithm Training (ADAS) HD Color Video, HD Thermal Video Yes Day and night vehicle-based video in IR and color, with annotations for the following classes: person, car, bicycle, dog, other vehicle. Multi-Target Person Tracking, Multi-Target Vehicle Tracking Public https://www.flir.com/oem/adas/adas-dataset-form/
GRIMA Database of X-ray images X-ray Imagery Yes X-ray imagery of castings, welds, packed baggage, nature, and settings. Baggage dataset contains 8,150 images of different containers and baggage with labeled instances of threats (handguns, shuriken, razor blades) from different points of view. Anomaly Detection, Threat Detection, Material Discrimination, Semantic Segmentation Public https://domingomery.ing.puc.cl/material/gdxray/
IJB-C and IJB-S collected by IARPA Color Video Yes Surveillance video face datasets. The Database consists of still images, frames and videos of celebrities and Internet personalities collectedfrom the web. There are 3531 subjects in the Database and the Database is designed to have no overlap withthe popular face recognition benchmarks, such as University of Oxfords VGG-Face dataset and the CASIAWebFace dataset Facial Recognition, Face Detection Need IRB https://www.nist.gov/itl/iad/ig/ijb-c-dataset-request-form
IJB-MDF collected by IARPA Multispectral Imagery Yes Multispectral Face dataset containing color imagery and SWIR imagery for surveillance cameras in 4 discrete locations. Annotations include subject labels for each imaging modality. Facial Recognition, Face Detection Need IRB https://ieeexplore.ieee.org/abstract/document/9186007
iLIDS-VID Video Re-ID Dataset Color Video Yes This dataset was created from the pedestrians observed in two non-overlapping camera views from the i-LIDS Multiple-Camera Tracking Scenario (MCTS) dataset which was captured at an airport arrival hall under a multi-camera CCTV network. It comprises 600 image sequences of 300 distinct individuals, with one pair of image sequences from two camera views for each person. Each image sequence has variable length ranging from 23 to 192 image frames, with an average number of 73. Multi-Target Multi-Camera Person Tracking, Multi-Target Multi-Camera Person Re-ID Public https://xiatian-zhu.github.io/downloads_qmul_iLIDS-VID_ReID_dataset.html
INRIA Aerial Image Labeling Dataset Overhead HD Color Video Yes Aerial imagery of buildings, with pixel-wise binary annotations for the 'building' label. Imagery includes a wide range of density of urban settlements, including large cities and remote villages. Anomaly Detection, Semantic Segmentation Public https://project.inria.fr/aerialimagelabeling/
KAIST Multispectral Pedestrian Dataset Color, Thermal Image Pairs Yes The KAIST Multispectral Pedestrian Dataset consists of 95k color-thermal pairs (640x480, 20Hz) taken from a vehicle. All the pairs are manually annotated (person, people, cyclist) for the total of 103,128 dense annotations and 1,182 unique pedestrians. The annotation includes temporal correspondence between bounding boxes like Caltech Pedestrian Dataset. Multi-Target Person Tracking Public https://soonminhwang.github.io/rgbt-ped-detection/
MIT Traffic Dataset Color Video Yes MIT traffic data set is for research on activity analysis and crowded scenes. It includes a traffic video sequence of 90 minutes long. It is recorded by a stationary camera. The size of the scene is 720 by 480 and is divided into 20 clips. Multi-Target Vehicle Tracking, Anomaly Detection, Multi-Target Person Tracking Public http://www.ee.cuhk.edu.hk/~xgwang/MITtraffic.html
Multi-Object Tracking (MOT) 17 Dataset HD Color Video Yes Single-camera HD Video in varied pedestrian environments, captured in varying camera angles. Multi-Target Person Tracking, Multi-Target Vehicle Tracking, Public https://motchallenge.net/data/MOT17/
Multiview Extended Video with Activities (MEVA) Dataset HD Color Video, Paired IR-EO cameras, Overhead Color Video Yes The KF1 data was collected over a total of three weeks at the Muscatatuck Urban Training Center (MUTC) with a team of over 100 actors performing in various scenarios. The fields of view, both overlapping and non-overlapping, capture person and vehicle activities in indoor and outdoor environments. There were multiple realistic scenarios with a variety of scripted and non-scripted activities. Multi-Class Multi-Target Vehicle Tracking, Multi-Class Multi-Target Person Tracking, Multi-Class Activity Classification, Multi-Class Interactions, Multi-Class Anomaly Detection Public https://mevadata.org/
NIST Color Facial Recognition Technology(FERET) HD Color Imagery Yes Set of 11,338 facial images of 994 subjects at various angles and poses, with documented subject ID and attriutes (glasses, facial hair, etc.) Facial Recognition, Person Re-ID Public upon Request https://www.nist.gov/itl/products-and-services/color-feret-database
OU-ISIR Gait Database, (Multi-View Large Population) MVLP Dataset HD Color Video Yes The data was collected in conjunction with an experience-based long-run exhibition of video-based gait analysis at a science museum. The approved informed consent was obtained from all the subjects in this dataset. The dataset consists of 10,307 subjects (5,114 males and 5,193 females with various ages, ranging from 2 to 87 years) from 14 view angles, ranging 0°-90°, 180°-270°. Gait images of 1,280 x 980 pixels at 25 fps are captured by seven network cameras (Cam1-7) placed at intervals of 15-deg azimuth angles along a quarter of a circle whose center coincides with the center of the walking course. Its radius is approximately 8 m and height is approximately 5 m. Gait Estimation, Gait Recognition Public upon Request http://www.am.sanken.osaka-u.ac.jp/BiometricDB/GaitMVLP.html
PandaSet LIDAR for Self-Driving Car Dataset Multiple Modalities Available Yes Dataset of multiple camera and LIDAR modalities and annotations for a vehicle traveling between a given source and destination. Multi-Target Vehicle Tracking, Object Recognition, Path Planning Public upon Request https://scale.com/open-datasets/pandaset#data-collection
RadarScenes Radar Point Cloud Yes Collection of zoned Radar point clouds generated from sensors mounted on a vehicle driving through a city. For these zones across multiple sensors, eleven different object classes are labeled: car, large vehicle, truck, bus, train, bicycle, motorized two-wheeler, pedestrian, pedestrian group, animal, and other. Multi-Target Vehicle Tracking, Object Recognition, Path Planning Public https://radar-scenes.com/dataset/about/
Semantic KITTI Dataset LiDAR Yes A dataset of dense annotations of every point for LiDAR scans taken from a vehicle navigating a road. The dataset contains 28 classes including classes distinguishing non-moving and moving objects. Overall, our classes cover traffic participants, but also functional classes for ground, like parking areas, sidewalks. Multi-Target Vehicle Tracking, Object Recognition, Path Planning Public http://www.semantic-kitti.org/dataset.html#download
Stanford Drone Dataset Overhead HD Color Video Yes UAV video imagery of 8 unique overhead scenes on Stanford campus. Annotations of 6 target types: bicyclist, pedestrian, golf cart, skateboarder, cart, car, bus. Multi-Class Multi-Target Vehicle Tracking, Multi-Class Multi-Target Person Tracking Public https://cvgl.stanford.edu/projects/uav_data/
Tracking Any Object (TAO) HD Color Video Yes Contains 2,907 high resolution videos captured in diverse environments, which are half a minute long on average, annotated with 833 categories. Multi-Target Person Tracking, Multi-Target Vehicle Tracking, Public https://motchallenge.net/data/TAO_Challenge/
UG2+ Datasets Overhead HD Color Video Yes Uncontrolled UAV videos extracted from Youtube, contains 37 ImageNet superclasses. Video artifacts/problems include shaking, camera blur, annotations occlusion, etc. Unique objects are given a track ID for tracking between frames. Anomaly Detection, Multi-Target Person Tracking, Multi-Target Vehicle Tracking Public http://www.ug2challenge.org/dataset18.html#
VIRAT HD Color Video Yes Surveillance video imagery in cluttered, diverse scenes. Annotations include class categories for scene elements, activities, and complex events involving single-object activities and two-object interactions. Multi-Class Multi-Target Vehicle Tracking, Multi-Class Multi-Target Person Tracking, Multi-Class Activity Classification, Multi-Class Interactions, Multi-Class Anomaly Detection Public https://viratdata.org/
Waymo Open Dataset Multiple Modalities Available Yes High resolution sensor data collected by Waymo self-driving cars in a wide variety of conditions. Currently contains lidar and camera data from 1,000 segments (20s each): 1,000 segments of 20s each, collected at 10Hz (200,000 frames) in diverse geographies and conditions, Labels for 4 object classes - Vehicles, Pedestrians, Cyclists, Signs, 12M 3D bounding box labels with tracking IDs on lidar data, 1.2M 2D bounding box labels with tracking IDs on camera data. Multi-Target Vehicle Tracking, Multi-Target Person Tracking, Object Recognition, Classification Public https://waymo.com/open/
xView Overhead HD Color Imagery Yes Diverse and representative collection of overhead imagery, labeled with 60 classes. Imagery is complex, and often cluttered aerial imagery of varying scenes. Object Detection, Classification Public upon Request http://xviewdataset.org/
xView 2 Overhead HD Color Imagery Yes Dataset of aerial imagery of damage following natural distasters. Annotated polygons and damage scores for each building, giving particular attention to on-the-ground changes between pre-disaster and post-disaster imagery. With over 850,000 building polygons from six different types of natural disaster around the world, covering a total area of over 45,000 square kilometers, the xBD dataset is one of the largest and highest quality public datasets of annotated high-resolution satellite imagery. Build Damage Classification Public upon Request https://xview2.org/