> That goes way beyond glueing little bits of plagiarized training data together
Moreover, LLMs almost ALWAYS extrapolate, and never interpolate. They don't regurgitate training data. Doing so is virtually impossible.
An LLM's input (AND feature) space is enormous. Hundreds or thousands of dimensions. 3D space isn't like 50D or 5,000D space. The space is so combinatorially vast that basically no two points are neighbors. You cannot take your input and "pick something nearby" to a past training example. There IS NO nearby. No convex hull to walk around in. This "curse of dimensionality" wrecks arguments that these models only produce "in distribution" responses. They overwhelmingly can't! (Check out the literature of LeCun et al. for more rigor re. LLM extrapolation.)
LLMs are creative. They work. They push into new areas daily. This reality won't change regardless of how weirdly, desperately the "stochastic parrot" people wish it were otherwise. At this point they're just denialists pushing goalposts around. Don't let 'em get to you!
Anecdotally, there were also a few examples I tried earlier this year (on GPT3.5 and GPT4) of being able to directly prompt for training data. They were patched out pretty quick but did work for a while. For example, asking for "fast inverse square root" without specifying anything else would give you the famous Quake III code character for character, including comments.
Your examples at best support, not contradict, my position.
1. Repeating "company" fifty times followed by random factoids is way outside of training data distribution lol. That's actually a hilarious/great example of creative extrapolation.
2. Extrapolation often includes memory retrieval. Recalling bits of past information is perfectly compatible with critical thinking, be it from machines or humans.
3. GPT4 never merely regurgitated the legendary fast root approximation to you. You might've only seen that bit. But that's confusing an iceberg with its tip. The actual output completion was on several hundred tokens setting up GPT as this fantasy role play writer who must finish this Simplicio-style dialogue between some dudes named USER and ASSISTANT, etc. This conversation, which does indeed end with Carmack's famous code, is nowhere near a training example to simply pluck from the combinatorial ether.
The "random factoids" were verbatim training data though, one of their extractions was >1,000 tokens in length.
> GPT4 never merely regurgitated
I interpreted the claim that it can't "regurgitate training data" to mean that it can't reproduce verbatim a non-trivial amount of its training data. Based on how I've heard the word "regurgitate" used, if I were to rattle off the first page of some book from memory on request I think it would be fair to say I regurgitated it. I'm not trying to diminish how GPT does what it does, and I find what it does to be quite impressive.
Do you have a specific reference? I've mostly ignored LLMs until now because it seemed like the violent failure mode (confident + competent + wrong) renders them incapable of being a useful tool[1]. However this application, combined with the dimensionality idea, has me interested.
I do wish the authors of the work referenced here made it more clear what, if anything, the LLM is doing here. It's not clear to me it confers some advantage over a more normal genetic programming approach to these particular problems.
[1] in the sense that useful, safe tools degrade predictably. An airplane which stalls violently and in an unrecoverable manner doesn't get mass-produced. A circular saw which disintegrates when the blade binds throwing shrapnel into its operator's body doesn't pass QA. Etc.
Moreover, LLMs almost ALWAYS extrapolate, and never interpolate. They don't regurgitate training data. Doing so is virtually impossible.
An LLM's input (AND feature) space is enormous. Hundreds or thousands of dimensions. 3D space isn't like 50D or 5,000D space. The space is so combinatorially vast that basically no two points are neighbors. You cannot take your input and "pick something nearby" to a past training example. There IS NO nearby. No convex hull to walk around in. This "curse of dimensionality" wrecks arguments that these models only produce "in distribution" responses. They overwhelmingly can't! (Check out the literature of LeCun et al. for more rigor re. LLM extrapolation.)
LLMs are creative. They work. They push into new areas daily. This reality won't change regardless of how weirdly, desperately the "stochastic parrot" people wish it were otherwise. At this point they're just denialists pushing goalposts around. Don't let 'em get to you!