Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very cool that this beats Word2Vec on SQuAD! I wonder if the current state of the art models are using the standard GloVe word vecs and might see improvement from this. Tbh I don't know too much about how those have been implemented though, haha.

I'm curious, how many values did you try for the quantization functions? Without thinking too much about it, that seems like one of the hyperparams that could have a pretty big impact on performance.



You're definitely right, the quantization function and its values definitely have an impact on performance.

For 1 bit I think I tried something like -1/+1, -.5/+.5, -.25/+.25, -.333/+.333. and something like -10/+10 -- (and I think a few more). It seemed -.333/+.333 worked the best while +10/-10 did the worst on the google analogy task (getting like 0% right). All this was tuned on 100MB of Wikipedia data.


Have you considered doing gradient descent on the quantization steps? It looks to me like the model should be differentiable with respect to those values, so I'm not sure why you'd have to fix them to a constant.


Hm what do you mean? I'm not quite seeing how to differentiate with respect to the quantization steps.


Say you have a function f(q(x)) where q quantizes x into one of s_1, ..., s_n. Then if q(x) = s_i for a certain x, df/ds_i = df/dq and df/ds_j = 0 for all j != i.

That breaks down for values of x precisely at the boundary between steps, so I should have qualified "differentiable" with "almost everywhere".

It also occurs to me that this might interact strangely with the approximation dq/dx = 1, but since the quantization steps are globally shared, I think it should be stable anyway.

If the evaluation suite for your code doesn't require too much manual interaction, I might try and see for myself.


That's definitely an interesting idea -- it seems this would allow for boundaries that "change" along with the data (instead of having static boundaries as it is). Would be interested to know how that turns out!




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: