SynWTS is a high-fidelity synthetic dataset built as a Digital Twin of the Woven Traffic Safety (WTS) dataset. It is developed for the 2026 AI City Challenge (Track 2) to advance Sim2Real research in transportation safety understanding.

Dataset Summary

Participants in the Sim2Real challenge must train models exclusively on this synthetic data and evaluate performance on real-world video. SynWTS provides a geometric match to real-world test locations, focusing on pedestrian-involved incidents with multi-view 1080p video, structured temporal captions, and complex Visual Question Answering (VQA) pairs.

Key Features

Sim2Real Benchmark: Specifically designed to bridge the gap between NVIDIA Isaac Sim environments and real-world traffic scenarios.
Multi-View Perception: Synchronized views from overhead infrastructure cameras and vehicle-ego perspectives.
Temporal Segmentation: Scenarios are partitioned into five safety-critical phases: Pre-recognition, Recognition, Judgment, Action, and Avoidance.
Structured Annotations: Descriptions cover four pillars: Location, Attention, Behavior, and Context.

Dataset Structure

Directory Layout

data/
├── videos/
│   └── {split}/{scenario}/{view}/*.mp4
├── annotations/
│   ├── caption/
│   │   └── {split}/{scenario}/{view}/{scenario}_caption.json
│   ├── bbox_annotated/
│   │   ├── pedestrian/{split}/{scenario}/{view}/{scenario}_{camera_id}_bbox.json
│   │   └── vehicle/{split}/{scenario}/overhead_view/{scenario}_{camera_id}_bbox.json
│   └── vqa/
│       └── {split}/{scenario}/{view}/{scenario}.json

{split} = train | val | test

{view} = overhead_view | vehicle_view | environment

{camera_id} = {camera_ip_address}_{direction_id} | vehicle_view

Data Fields & Samples

1. Fine-Grained Captions

Captions are generated from a checklist of 170+ traffic items. Each event phase contains a distinct caption for the pedestrian and the vehicle. We used the same annotations as in the WTS dataset and only updated necessary details that could not be simulated in the current version.

Sample (from overhead_view_caption.json):

{
    "id": 765,
    "event_phase": [
        {
            "labels": ["4"],
            "caption_pedestrian": "The pedestrian was a male in his 30s walking slowly... He was standing close behind a vehicle... Although he almost noticed the vehicle, he seemed unaware of it.",
            "caption_vehicle": "The vehicle was on the left side of the pedestrian and was close to them... The vehicle slightly collided with the pedestrian while moving at a speed of 0 km/h.",
            "start_time": "8.993",
            "end_time": "14.903"
        }
    ]
}

2. Visual Question Answering (VQA)

Includes multiple-choice questions covering position, distance, visibility, and actions.

Sample (from vqa-vehicle_view.json):

{
    "question": "What is the action taken by vehicle?",
    "a": "Swerved to the left to avoid",
    "b": "Swerved to the right, but could not avoid",
    "c": "Tried sudden braking but could not avoid",
    "d": "Collided with the pedestrian",
    "correct": "d"
}

Technical Specifications & Limitations

Digital Twin Characteristics

Environmental Fidelity: Roads and buildings are a close geometric match to real-world WTS locations.
No 3D Gaze: Unlike the original WTS, 3D gaze and head bounding boxes are not included due to simulation constraints.
Character Dynamics: Poses are simulated and may not perfectly replicate real-world physics.
Object Limitations: Characters do not hold hand-held objects (umbrellas, phones) that may appear in the real-world test set. Labels/VQA have been adjusted accordingly.

Release Schedule

Initial Release: 80 scenarios (Current)
Mid-May Update: ~150 scenarios (Expected May 15, 2026)
Final Dataset: ~240 scenarios total.

Team & Credits

Santa Clara University

Dhanishtha Patil, Ridham Kachhadiya, Andrew Vattuone, and David C. Anastasiu

NVIDIA

Haoquan Liang, Jiajun Li, Yuxing Wang, and Thomas Tang

Woven by Toyota

Ashutosh Kumar and Quan Kong

Point of Contact:

For questions regarding the SynWTS dataset or the AI City Challenge Track 2, please contact:

David C. Anastasiu Email: danastasiu@scu.edu

Citation

Please cite the original WTS paper and the 2026 AI City Challenge:

@article{kong2024wts,
  title={WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding},
  author={Kong, Quan and Kumar, Ashutosh and others},
  journal={arXiv preprint arXiv:2407.15350},
  year={2024}
}

Stay tuned for an updated citation to our dataset paper.

Downloads last month: 30

Total file size:

1.14 GB

Paper for mlcglab/synwts

WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding

Paper • 2407.15350 • Published Jul 22, 2024