Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
-
Updated
Feb 13, 2024 - Python
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
😎 up-to-date & curated list of awesome LMM hallucinations papers, methods & resources.
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
[LMM + AIGC] What do we expect from LMMs as AIGI evaluators and how do they perform?
LLaVA inference with multiple images at once for cross-image analysis.
Tools and Statistical Procedures in Plant Science
A Mathematica paclet for analyzing and deriving Runge–Kutta, linear multistep, and general linear methods
Linear mixed model genome scans for many traits
Add a description, image, and links to the lmm topic page so that developers can more easily learn about it.
To associate your repository with the lmm topic, visit your repo's landing page and select "manage topics."