Vision - a Ambroser53 Collection

Ambroser53 's Collections

active learning

RL

Vision

updated Jul 22, 2024

InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions

Paper • 2401.13313 • Published Jan 24, 2024 • 5
BAAI/Bunny-v1_0-4B

Text Generation • 4B • Updated Jun 24, 2024 • 13 • 10
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 103
Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published May 30, 2024 • 37
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17, 2024 • 62
VoCo-LLaMA: Towards Vision Compression with Large Language Models

Paper • 2406.12275 • Published Jun 18, 2024 • 31
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Paper • 2406.13923 • Published Jun 20, 2024 • 25
Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 96
ColPali: Efficient Document Retrieval with Vision Language Models

Paper • 2407.01449 • Published Jun 27, 2024 • 51
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Paper • 2407.12594 • Published Jul 17, 2024 • 19