view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 1 day ago • 201
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs Paper • 2602.10388 • Published 10 days ago • 219
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Paper • 2602.08683 • Published 12 days ago • 46
BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper • 2602.14041 • Published 6 days ago • 42
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published 16 days ago • 320
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing Paper • 2602.12205 • Published 9 days ago • 78
Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception Paper • 2602.11858 • Published 9 days ago • 58
Ming-V2 Collection Ming is the multi-modal series of any-to-any models developed by Ant Ling team. • 14 items • Updated 7 days ago • 34
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation Paper • 2510.24821 • Published Oct 28, 2025 • 41
Ming-Omni: A Unified Multimodal Model for Perception and Generation Paper • 2506.09344 • Published Jun 11, 2025 • 31
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated Aug 21, 2025 • 502
Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names Paper • 2408.00298 • Published Aug 1, 2024 • 11
The Manga Whisperer: Automatically Generating Transcriptions for Comics Paper • 2401.10224 • Published Jan 18, 2024 • 3
PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing Paper • 2601.21957 • Published 23 days ago • 19