Ambroser53 's Collections Vision
updated
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document
Understanding with Instructions
Paper
• 2401.13313
• Published
• 5
Text Generation
• 4B • Updated
• 13
• 10
What matters when building vision-language models?
Paper
• 2405.02246
• Published
• 103
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper
• 2405.20204
• Published
• 37
Vision Mamba: Efficient Visual Representation Learning with
Bidirectional State Space Model
Paper
• 2401.09417
• Published
• 62
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Paper
• 2406.12275
• Published
• 31
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal
Documents
Paper
• 2406.13923
• Published
• 25
Instruction Pre-Training: Language Models are Supervised Multitask
Learners
Paper
• 2406.14491
• Published
• 96
ColPali: Efficient Document Retrieval with Vision Language Models
Paper
• 2407.01449
• Published
• 51
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document
Understanding
Paper
• 2407.12594
• Published
• 19