quantization
updated
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper
• 2402.04291
• Published
• 50
OneBit: Towards Extremely Low-bit Large Language Models
Paper
• 2402.11295
• Published
• 24
A Survey on Transformer Compression
Paper
• 2402.05964
• Published
• 1
Towards Next-Level Post-Training Quantization of Hyper-Scale
Transformers
Paper
• 2402.08958
• Published
• 5
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper
• 2402.10193
• Published
• 21
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper
• 2402.15319
• Published
• 22
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
Paper
• 2403.02775
• Published
• 13
4-bit Shampoo for Memory-Efficient Network Training
Paper
• 2405.18144
• Published
• 12
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers
in LLMs
Paper
• 2410.05265
• Published
• 33
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper
• 2411.04965
• Published
• 69
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM
Quantization
Paper
• 2411.02355
• Published
• 51
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression
of Neural Networks
Paper
• 2410.20650
• Published
• 17
BitStack: Fine-Grained Size Control for Compressed Large Language Models
in Variable Memory Environments
Paper
• 2410.23918
• Published
• 21