I read a great essay by Stephen Wolfram: “What is ChatGPT and Why Does it Work.” I found it interesting enough to write up some notes:
At a high level, ChatGPT is a probabilistic system: what word or string of words is most likely to follow another word or string of words. “The basic task for ChatGPT is to figure out how to continue a piece of text that it’s been given.”
ChatGPT uses a neural net architecture. Wolfram writes “it can be easier to solve more complicated problems with neural nets than simple ones. And the rough reason for this seems to be that when one has a lot of “weight variables” one has a high-dimensional space with “lots of different directions” that can lead one to the minimum—whereas with fewer variables it’s easier to end up getting stuck in a local minimum (“mountain lake”) from which there’s no “direction to get out”.” This statement has profound implications for what problems neural nets can solve well, vs. those that it will solve suboptimally - worth exploring from a commercial and application lens.
Similar neural net architecture works well to solve different types of tasks. Wolfram writes that “this is a reflection of the fact that the tasks we’re typically trying to get neural nets to do are ‘human-like’ ones - and neural nets can capture quite general ‘human-like’ processes.”
One thing that really struck me from the article is that there’s a lot about how neural nets work that we still don’t understand; similar to the idea that there’s a lot about the brain’s functioning that we also still don’t understand. We have a general idea of why things work the way they do, and have figured out that some methods of constructing and using neural nets work better than others, but the specifics are still somewhat of a mystery.
“Training a neural net is hard - and takes a lot of computation effort….which is why neural net training is typically limited by the availability of GPUs.” This comes back to my discussion yesterday that a clear early winner of the current AI explosion is NVidia.
Wolfram reiterates his idea of “computational irreducibility”: there are processes that cannot be reduced to less computations. You have to go through all the steps.
“Computationally irreducible processes…are still fundamentally hard for computers - even if computers can readily compute their individual steps…what we should conclude [from ChatGPT] is that tasks - like writing essays - that we humans could do, but we didn’t think computers could do, are in some sense computationally easier than we thought.”
The above finding/discovery is an exciting one - that human language is a more straightforward and computable system than we have imagined. Or as Wolfram writes, “language is at a fundamental level somehow simpler than it seems.” That has fascinating implications for everything from linguistics and translation to brain science.
Wolfram’s essay serves as a great and only moderately technical introduction to the workings of ChatGPT. In future posts, I’ll continue to dive into this topic, both from a theoretical and commercial perspective.