Post-transformer inference: 224× compression of Llama-70B with improved accuracy
zenodo.org/records/17873275
#ycombinator
zenodo.org/records/17873275
#ycombinator
Post-Transformer Inference: 224× Compression of Llama-70B with Improved Accuracy
This paper introduces the first verified method to eliminate transformers from inference while preserving, and in many cases improving, downstream accuracy. We show that a frozen 70-billion-parameter Llama-3.Zenodo

