| --- |
| language: en |
| datasets: |
| - laion2b |
| --- |
| |
| # OpenFlamingo-9B (Deprecated) |
|
|
| **This early checkpoint was part of an initial release. It has since been deprecated in favor of [other checkpoints](https://huggingface.co/openflamingo/OpenFlamingo-9B-vitl-mpt7b) as part of the OpenFlamingo v2 release. However, it is possible to continue using this older checkpoint in the new codebase.** |
|
|
| ---- |
|
|
| [Blog post](https://laion.ai/blog/open-flamingo/) | [Code](https://github.com/mlfoundations/open_flamingo) | [Demo](https://7164d2142d11.ngrok.app) |
|
|
| OpenFlamingo is an open source implementation of DeepMind's [Flamingo](https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model) models. |
| OpenFlamingo-9B is built off of [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14) and [LLaMA-7B](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/). Before using this model please familiarize yourself with our [terms and conditions](https://github.com/mlfoundations/open_flamingo/blob/main/TERMS_AND_CONDITIONS.md). |
|
|
| ## Model Details |
| We freeze the pretrained vision encoder and language model, and then we train connecting Perceiver modules and cross-attention layers, following the original Flamingo paper. |
|
|
| Our training data is a mixture of [LAION 2B](https://huggingface.co/datasets/laion/laion2B-en) and a large interleaved image-text dataset called Multimodal C4, which will be released soon. |
|
|
| The current model is an early checkpoint of an ongoing effort. This checkpoint has seen 5 million interleaved image-text examples from Multimodal C4. |
|
|
| ## Uses |
| OpenFlamingo-9B is intended to be used **for academic research purposes only.** Commercial use is prohibited, in line with LLaMA's non-commercial license. |
|
|
| ### Bias, Risks, and Limitations |
| This model may generate inaccurate or offensive outputs, reflecting biases in its training data and pretrained priors. |
|
|
| In an effort to mitigate current potential biases and harms, we have deployed a content filter on model outputs in the OpenFlamingo demo. We continue to red-team the model to understand and improve its safety. |
|
|
| ## Evaluation |
| We've evaluated this checkpoint and report validation performance for two vision-language tasks: COCO captioning and VQAv2. Results are displayed below. |
|
|
| **COCO (CIDEr)** |
|
|
| |0-shot|4-shot|8-shot|16-shot|32-shot| |
| |--|--|--|--|--| |
| |65.52|74.28|79.26|81.84|84.52| |
|
|
|
|
| **VQAv2 (VQA accuracy)** |
|
|
| |0-shot|4-shot|8-shot|16-shot|32-shot| |
| |---|---|---|---|---| |
| |43.55|44.05|47.5|48.87|50.34| |