> That goes way beyond glueing little bits of plagiarized training data together...

happypumpkin · on Dec 15, 2023

> They don't regurgitate training data.

While I very much do not think this is all they do, I don't think this statement is correct. Some research indicates that it is not:

https://not-just-memorization.github.io/extracting-training-...

Anecdotally, there were also a few examples I tried earlier this year (on GPT3.5 and GPT4) of being able to directly prompt for training data. They were patched out pretty quick but did work for a while. For example, asking for "fast inverse square root" without specifying anything else would give you the famous Quake III code character for character, including comments.

a_wild_dandan · on Dec 15, 2023

Your examples at best support, not contradict, my position.

1. Repeating "company" fifty times followed by random factoids is way outside of training data distribution lol. That's actually a hilarious/great example of creative extrapolation.

2. Extrapolation often includes memory retrieval. Recalling bits of past information is perfectly compatible with critical thinking, be it from machines or humans.

3. GPT4 never merely regurgitated the legendary fast root approximation to you. You might've only seen that bit. But that's confusing an iceberg with its tip. The actual output completion was on several hundred tokens setting up GPT as this fantasy role play writer who must finish this Simplicio-style dialogue between some dudes named USER and ASSISTANT, etc. This conversation, which does indeed end with Carmack's famous code, is nowhere near a training example to simply pluck from the combinatorial ether.

happypumpkin · on Dec 15, 2023

> random factoids

The "random factoids" were verbatim training data though, one of their extractions was >1,000 tokens in length.

> GPT4 never merely regurgitated

I interpreted the claim that it can't "regurgitate training data" to mean that it can't reproduce verbatim a non-trivial amount of its training data. Based on how I've heard the word "regurgitate" used, if I were to rattle off the first page of some book from memory on request I think it would be fair to say I regurgitated it. I'm not trying to diminish how GPT does what it does, and I find what it does to be quite impressive.

jcgrillo · on Dec 15, 2023

Do you have a specific reference? I've mostly ignored LLMs until now because it seemed like the violent failure mode (confident + competent + wrong) renders them incapable of being a useful tool[1]. However this application, combined with the dimensionality idea, has me interested.

I do wish the authors of the work referenced here made it more clear what, if anything, the LLM is doing here. It's not clear to me it confers some advantage over a more normal genetic programming approach to these particular problems.

[1] in the sense that useful, safe tools degrade predictably. An airplane which stalls violently and in an unrecoverable manner doesn't get mass-produced. A circular saw which disintegrates when the blade binds throwing shrapnel into its operator's body doesn't pass QA. Etc.

a_wild_dandan · on Dec 15, 2023

"Learning in High Dimension Always Amounts to Extrapolation" [1]

[1] https://arxiv.org/abs/2110.09485

jcgrillo · on Dec 15, 2023

Thank you. I'll give this a look.