AVControl: Efficient Framework for Training Audio-Visual Controls Paper • 2603.24793 • Published Mar 25 • 27
Are Vision-Language Models Truly Understanding Multi-vision Sensor? Paper • 2412.20750 • Published Dec 30, 2024 • 20