arxiv:2602.03120

Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost

Published on Feb 3

· Submitted by

Yinggan Xu on Feb 16

Upvote

Authors:

Yinggan Xu ,

Abstract

Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement Learning (RL), fundamentally rely on backpropagation and high-precision weights to compute gradients. Thus they cannot be used on quantized models, where the parameter space is discrete and non-differentiable. While Evolution Strategies (ES) offer a backpropagation-free alternative, optimization of the quantized parameters can still fail due to vanishing or inaccurate gradient. This paper introduces Quantized Evolution Strategies (QES), an optimization paradigm that performs full-parameter fine-tuning directly in the quantized space. QES is based on two innovations: (1) it integrates accumulated error feedback to preserve high-precision gradient signals, and (2) it utilizes a stateless seed replay to reduce memory usage to low-precision inference levels. QES significantly outperforms the state-of-the-art zeroth-order fine-tuning method on arithmetic reasoning tasks, making direct fine-tuning for quantized models possible. It therefore opens up the possibility for scaling up LLMs entirely in the quantized space. The source code is available at https://github.com/dibbla/Quantized-Evolution-Strategies .

View arXiv page View PDF GitHub 4 Add to collection

Community

Dibbla

Paper author Paper submitter about 24 hours ago

•

edited about 24 hours ago

We study an interesting question of whether it is possible to perform full-parameter fine-tuning on Large Language Models (LLMs) directly within the quantized space, effectively bypassing the need for high-precision weights and standard backpropagation.

Usually, Post-Training Quantization (PTQ) renders a model static; you can't easily apply standard fine-tuning or Reinforcement Learning (RL) because the parameter space becomes discrete and non-differentiable. Even standard Evolution Strategies (ES)—which are backprop-free—often struggle here due to vanishing or inaccurate gradients.

We propose a novel solution called Quantized Evolution Strategies (QES). It enables direct optimization of quantized parameters through two key designs:

Accumulated Error Feedback: This preserves high-precision gradient signals that would otherwise be lost.
Stateless Seed Replay: This keeps memory usage down to low-precision inference levels.

QES significantly outperforming state-of-the-art zeroth-order methods on arithmetic reasoning tasks, and more experiments are on the way. This could be a major step toward playing LLMs entirely in the quantized space!

librarian-bot

about 7 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.03120 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.03120 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.03120 in a Space README.md to link it from this page.

Collections including this paper 1

Abstract

We propose a novel solution called Quantized Evolution Strategies (QES). It enables direct optimization of quantized parameters through two key designs:

Accumulated Error Feedback: This preserves high-precision gradient signals that would otherwise be lost.
Stateless Seed Replay: This keeps memory usage down to low-precision inference levels.