Post-transformer inference: 224× compression of Llama-70B with improved accuracy
zenodo.org/records/17873275
#ycombinator