
Written by
Published on
TLDR: We are releasing makora-ai/Qwen3.6-35B-A3B-MixFP4, a quantized checkpoint of Qwen3.6-35B-A3B that uses the experimental MixFP4 format. It requires no calibration data to produce and scores higher on accuracy benchmarks than NVFP4 quantized checkpoints from Nvidia and Unsloth. The model is available for free on Hugging Face: https://huggingface.co/makora-ai/Qwen3.6-35B-A3B-MixFP4
Introducing MixFP4, an accuracy-improving extension to NVFP4
MixFP4 (Zou et al., 2026) is an adaptive extension to NVFP4 that enables block-wise selection between NVFP4 (E2M1 floating point) and INT4 (4-bit signed integer) representations to better match local tensor statistics.
Modern LLM tensors contain blocks with dramatically different value distributions. Blocks with large outliers benefit from exponent-heavy NVFP4 representations, while flatter blocks are better represented by an INT4 codebook. Rather than forcing every block to use the same numerical format, MixFP4 adaptively selects between the two formats for each block without introducing additional parameters.
What MixFP4 changes
Similar to NVFP4, MixFP4 quantizes values together in 16-element blocks. Each block is then accompanied with a 8-bit value composed of an unsigned FP7 E4M3 scaling factor with its sign bit instead used to indicate the data type of the block (0 for INT4 and 1 for NVFP4), giving the loader enough information to recover the selected representation without adding a separate metadata tensor for the block type.
No calibration dataset is required. The block format is selected from quantization error on the weights themselves, so the conversion does not need a representative prompt dataset, a calibration pass, or application-specific tuning before it can run.
Implementation
Our model is available through the HuggingFace transformers library for easy usage (see our example later). This implementation is designed primarily for others to verify our checkpoints quality, not to deliver fast inference. Maintaining performance requires implementing custom MixFP4 kernels. At Makora, we have done this using MakoraGenerate, our automated kernel generation software, and are serving the fast implementation on MakoraInference. With the custom MixFP4 kernels, we get an accuracy improvement with minimal loss in model speeds.
Results
The MixFP4 checkpoint is about one third the size of the BF16 target model while matching it on MMLU-Pro instruct-mode evaluation. Among the Qwen3.6 quantized checkpoints we compared against, MixFP4 has the highest MMLU-Pro score, lowest KL divergence, and lowest WikiText-2 perplexity, as seen in the table below.
Model | Checkpoint size (lower is better) | MMLU-Pro, instruct (higher is better) | KL divergence (lower is better) | WikiText-2 perplexity (lower is better) |
|---|---|---|---|---|
| 66.99 GiB | 62.43% | N/A | 6.4574 |
| 21.29 GiB | 62.62% | 0.026935 | 6.5022 |
| 21.85 GiB | 61.80% | 0.038476 | 6.6129 |
| 23.01 GiB | 60.80% | 0.061846 | 6.6984 |
We measured MMLU-Pro on all 12,032 problems in instruct mode with reasoning disabled for all checkpoints. For KL divergence, we used 100 conversations of 256 tokens from Aeala/ShareGPT_Vicuna_unfiltered to compare each model’s token probabilities to the base model. For WikiText-2 perplexity, we used the full Salesforce/wikitext dataset’s wikitext-2-raw-v1 test split with Qwen tokenization, non-empty rows joined by blank-line separators, sequence length 2048, and stride 2048.
Get started
Our checkpoint can be used in transformers with it’s Hugging Face model ID: makora-ai/Qwen3.6-35B-A3B-MixFP4.
Closing
Our checkpoint is available on Hugging Face now: https://huggingface.co/makora-ai/Qwen3.6-35B-A3B-MixFP4
Try it, measure it’s accuracy, or try it at full speed through MakoraInference!
References
MixFP4 by Zou et al., ICML 2026.: https://arxiv.org/abs/2605.31035
Model: https://huggingface.co/makora-ai/Qwen3.6-35B-A3B-MixFP4
Base model: https://huggingface.co/Qwen/Qwen3.6-35B-A3B
Latest
From the blog
The latest industry news, interviews, technologies, and resources.




