Stars
Machine Learning Engineering Open Book
Video-R1: Towards Super Reasoning Ability in Video Understanding MLLMs
Train transformer language models with reinforcement learning.
Fast and memory-efficient exact attention
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
Official inference repo for FLUX.1 models
Repository for the main Dockerfile with the OpenWorm software stack and project-wide issues
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Experience macOS just like before
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
This repository contains implementations and illustrative code to accompany DeepMind publications
The open source (GPL v2) turn-by-turn navigation software for many OS
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
ECCV18 Workshops - Enhanced SRGAN. Champion PIRM Challenge on Perceptual Super-Resolution. The training codes are in BasicSR.
[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs
High-resolution models for human tasks.
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
We write your reusable computer vision tools. 💜