Stars
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
Anonymous Github is a proxy server to support anonymous browsing of Github repositories for open-science code and data.
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
✨✨Latest Advances on Multimodal Large Language Models
A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)
Papers and Datasets on Instruction Tuning and Following. ✨✨✨
Reading list of Instruction-tuning. A trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).
Awesome-LLM: a curated list of Large Language Model
A collection of resources and papers on Diffusion Models
[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
Official Repository of ChatCaptioner
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part
[ICLR 2022 poster] Official PyTorch implementation of "Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework"
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
VOLO: Vision Outlooker for Visual Recognition
Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Official code for Conformer: Local Features Coupling Global Representations for Visual Recognition