How does AI work at even the most basic level?
I just asked ChatGPT4 a very simple question: Who were the first five presidents of the United States? Here is its answer:
*****************************************************************
The first five presidents of the United States were:
1. **George Washington** (1789–1797)
2. **John Adams** (1797–1801)
3. **Thomas Jefferson** (1801–1809)
4. **James Madison** (1809–1817)
5. **James Monroe** (1817–1825)
They’re ssociated with the founding era of the U.S., and several of them were key figures in drafting foundational documents like the Declaration of Independence and the Constitution.
************************************************************************
How did it manage to translate what I said into a database query? How did it know where in the database to look? Why did it list the years that each president served? Why did it list in chronological order and number each answer? What prompted it to give the additional information at the end? It all seems like magic to me.
Observing members:
0
Composing members:
0
8 Answers
It does not have a database. and at least in the earlier versions, it did not not search the web.
It is a probabilistic next-letter predictor based on a weighted neural net that just happens to spit out legible results.
Are you saying that by choosing the next letters from my question it is able to give grammatically and factually correct answers? Suppose I gave a followup question and referred to “the third person on the list”. How could it possibly get “Jefferson” from that?
That is where the training comes it.
It is trained on millions of text documents harvested from all over the internet, which basically involves giving the neural net an input/prompt, basically telling it what you want, and then adjusting the weights of the individual neurons, which decide if they are firing or not, until the output looks like the example text that you train it with. You do that billions of times with all the training texts you have amassed, until even new prompts result in legible texts.
LLMs use multiple layers of patterns. The only thing I’d question about what @ragingloli wrote is that it goes letter-by-letter. I think it mainly operates at at least a word-by-word level, and also at a sentence level. paragraph-level, and task-level.
How they actually do as well as they do is not well understood even by AI researchers, but part of it is that they are trained using truly vast amounts of sample text. And they can answer common types of questions like your example largely because:
1) Those are types of questions that people ask and provide answers to fairly quickly, and “presidents… United States” is going to be highly statistically correlated to articles with lists of standard information about them like that. (So to answer why it listed their term dates in chronological order – it’s probably a relatively common pattern present in the training data.)
2) These LLM AIs are also specifically trained in certain types of tasks. It’s given exampels of types of questions, and appropriate responses, and these are another layer of pattern recognition used to steer the responses to be expected sorts of responses.
3) So it’s because when you include most of THE INTERNET and entire libraries full of text (including dictionaries and encyclopedias, as well as informational web sites, which all have this sort of info) in a training set, and spend unfathomable amounts of computing power condensing all that into a statistical model, LLMs end up being able to look up and generate text that regurgitates a pseudo-random regurgitation of related information much like what has been written before, somewhere, by other people.
It is word by word. LLMs are strangely bad at generating and solving their own anagrams.
They start coming up with nonsense answers, or generating anagrams that don’t make a word, or are a different length to specified, or they come up with answers that are a letter or two off.
It seems like magic but if you consider all questions are given in the form of chains of words and all answers similarly are given as finite lists of words then you can see how with massive computing power you can link questions and answers probabalistically.
It makes you wonder to what extent we live in a “real” world and to what extent in our daily lives we are just manipulating or bring manipulated by words.
Answer this question 