Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost
Abstract
Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement Learning (RL), fundamentally rely on backpropagation and high-precision weights to compute gradients. Thus they cannot be used on quantized models, where the parameter space is discrete and non-differentiable. While Evolution Strategies (ES) offer a backpropagation-free alternative, optimization of the quantized parameters can still fail due to vanishing or inaccurate gradient. This paper introduces Quantized Evolution Strategies (QES), an optimization paradigm that performs full-parameter fine-tuning directly in the quantized space. QES is based on two innovations: (1) it integrates accumulated error feedback to preserve high-precision gradient signals, and (2) it utilizes a stateless seed replay to reduce memory usage to low-precision inference levels. QES significantly outperforms the state-of-the-art zeroth-order fine-tuning method on arithmetic reasoning tasks, making direct fine-tuning for quantized models possible. It therefore opens up the possibility for scaling up LLMs entirely in the quantized space. The source code is available at https://github.com/dibbla/Quantized-Evolution-Strategies .
Community
We study an interesting question of whether it is possible to perform full-parameter fine-tuning on Large Language Models (LLMs) directly within the quantized space, effectively bypassing the need for high-precision weights and standard backpropagation.
Usually, Post-Training Quantization (PTQ) renders a model static; you can't easily apply standard fine-tuning or Reinforcement Learning (RL) because the parameter space becomes discrete and non-differentiable. Even standard Evolution Strategies (ES)—which are backprop-free—often struggle here due to vanishing or inaccurate gradients.
We propose a novel solution called Quantized Evolution Strategies (QES). It enables direct optimization of quantized parameters through two key designs:
- Accumulated Error Feedback: This preserves high-precision gradient signals that would otherwise be lost.
- Stateless Seed Replay: This keeps memory usage down to low-precision inference levels.
QES significantly outperforming state-of-the-art zeroth-order methods on arithmetic reasoning tasks, and more experiments are on the way. This could be a major step toward playing LLMs entirely in the quantized space!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ECO: Quantized Training without Full-Precision Master Weights (2026)
- Breaking the Blocks: Continuous Low-Rank Decomposed Scaling for Unified LLM Quantization and Adaptation (2026)
- HESTIA: A Hessian-Guided Differentiable Quantization-Aware Training Framework for Extremely Low-Bit LLMs (2026)
- What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study (2026)
- NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models (2026)
- Hi-ZFO: Hierarchical Zeroth- and First-Order LLM Fine-Tuning via Importance-Guided Tensor Selection (2026)
- D$^2$Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper