Breaking Down the Hype: Unpacking the Findings of the Recent ACL Conference Paper
> Firstly, the gzip + KNN approach is lightweight and economical, especially when compared to cumbersome deep learning models.
as you mention later, it is not really economical given it requires a full KNN computation at each inference. In effect, this requires a lot of cpu and a lot of memory. I'm trying to reproduce the results but it seems even reproducing AG_NEWS takes around 24 hours.
> Unlike state-of-the-art models that necessitate training, this model is non-parametric — meaning it does not contain parameters that need to be learned, which significantly reduces costs.
yes, but this misses the idea that some compression algorithms do build a kind of ~parametric model~ when they build a compression dictionnary. The dictionnary is a form of learning and could be re-used at inference.
I tried to build a demo of such approach here:
it uses zstd to build compression dictionnaries.
Overall training + inference is multiple order of magnitudes faster than the gzip approach, and performance seems to be similar.
I'd be curious to have your feedback on this.
I wrote a follow-up article on this which dissects various compression algorithms to analyze what exactly is happening during compression leading to unexpected performance in text classification: https://codeconfessions.substack.com/p/lz77-is-all-you-need