11 Comments
Jul 23Liked by Abhinav Upadhyay

Hey,

> Firstly, the gzip + KNN approach is lightweight and economical, especially when compared to cumbersome deep learning models.

as you mention later, it is not really economical given it requires a full KNN computation at each inference. In effect, this requires a lot of cpu and a lot of memory. I'm trying to reproduce the results but it seems even reproducing AG_NEWS takes around 24 hours.

> Unlike state-of-the-art models that necessitate training, this model is non-parametric — meaning it does not contain parameters that need to be learned, which significantly reduces costs.

yes, but this misses the idea that some compression algorithms do build a kind of ~parametric model~ when they build a compression dictionnary. The dictionnary is a form of learning and could be re-used at inference.

I tried to build a demo of such approach here:

https://github.com/cyrilou242/ftcc

it uses zstd to build compression dictionnaries.

Overall training + inference is multiple order of magnitudes faster than the gzip approach, and performance seems to be similar.

I'd be curious to have your feedback on this.

Expand full comment
author

I wrote a follow-up article on this which dissects various compression algorithms to analyze what exactly is happening during compression leading to unexpected performance in text classification: https://codeconfessions.substack.com/p/lz77-is-all-you-need

Expand full comment