We not only have to stop ignoring the problem, we need to be absolutely clear about what the problem is.
LLMs don’t hallucinate wrong answers. They hallucinate all answers. Some of those answers will happen to be right.
If this sounds like nitpicking or quibbling over verbiage, it’s not. This is really, really important to understand. LLMs exist within a hallucinatory false reality. They do not have any comprehension of the truth or untruth of what they are saying, and this means that when they say things that are true, they do not understand why those things are true.
That is the part that’s crucial to understand. A really simple test of this problem is to ask ChatGPT to back up an answer with sources. It fundamentally cannot do it, because it has no ability to actually comprehend and correlate factual information in that way. This means, for example, that AI is incapable of assessing the potential veracity of the information it gives you. A human can say “That’s a little outside of my area of expertise,” but an LLM cannot. It can only be coded with hard blocks in response to certain keywords to cut it from answering and insert a stock response.
This distinction, that AI is always hallucinating, is important because of stuff like this:
But notice how Reid said there was a balance? That’s because a lot of AI researchers don’t actually think hallucinations can be solved. A study out of the National University of Singapore suggested that hallucinations are an inevitable outcome of all large language models. **Just as no person is 100 percent right all the time, neither are these computers. **
That is some fucking toxic shit right there. Treating the fallibility of LLMs as analogous to the fallibility of humans is a huge, huge false equivalence. Humans can be wrong, but we’re wrong in ways that allow us the capacity to grow and learn. Even when we are wrong about things, we can often learn from how we are wrong. There’s a structure to how humans learn and process information that allows us to interrogate our failures and adjust for them.
When an LLM is wrong, we just have to force it to keep rolling the dice until it’s right. It cannot explain its reasoning. It cannot provide proof of work. I work in a field where I often have to direct the efforts of people who know more about specific subjects than I do, and part of how you do that is you get people to explain their reasoning, and you go back and forth testing propositions and arguments with them. You say “I want this, what are the specific challenges involved in doing it?” They tell you it’s really hard, you ask them why. They break things down for you, and together you find solutions. With an LLM, if you ask it why something works the way it does, it will commit to the bit and proceed to hallucinate false facts and false premises to support its false answer, because it’s not operating in the same reality you are, nor does it have any conception of reality in the first place.
Some researchers compared the results of questions between chat gpt 3 and 4. One of the questions was about stacking items in a stable way. Chat gpt 3 just, in line with what you are saying about “without understanding”, listed the items saying to place them one on top of each other. No way it would have worked.
Chat gpt 4, however, said that you should put the book down first, put the eggs in a 3 x 3 grid on top of the book, trap them in a way with a laptop so they don’t roll around, and then put the bottle on top of the laptop standing up, and then balance the nail on the top of it…even noting you have to put the flat end of the nail down. This sounds a lot like understanding to me and not just rolling the dice hoping to be correct.
Yes, AI confidently gets stuff wrong. But let’s all note that there is a whole subreddit dedicated to people being confidently wrong. One doesn’t need to go any further than Lemmy to see people confidently claiming to know the truth about shit they should know is outside of their actual knowledge. We’re all guilty of this. Including refusing to learn when we are wrong. Additionally, the argument that they can’t learn doesn’t make sense because models have definitely become better.
Now I’m not saying ai is conscious, I really don’t know, but all of your shortcomings you’ve listed humans are guilty of too. So to use it as examples as to why it’s always just a hallucination, or that our thoughts are not, doesn’t seem to hold much water to me.
the argument that they can’t learn doesn’t make sense because models have definitely become better.
They have to be either trained with new data or their internal structure has to be improved. It’s an offline process, meaning they don’t learn through chat sessions we have with them (if you open a new session it will have forgotten what you told it in a previous session), and they can’t learn through any kind of self-directed research process like a human can.
all of your shortcomings you’ve listed humans are guilty of too.
LLMs are sophisticated word generators. They don’t think or understand in any way, full stop. This is really important to understand about them.
I think where you are going wrong here is assuming that our internal perception is not also a hallucination by your definition. It absolutely is. But our minds are embodied, thus we are able check these hallucinations against some outside stimulus. Your gripe that current LLMs are unable to do that is really a criticism of the current implementations of AI, which are trained on some data, frozen, then restricted from further learning by design. Imagine if your mind was removed from all stimulus and then tested. That is what current LLMs are, and I doubt we could expect a human mind to behave much better in such a scenario. Just look at what happens to people cut off from social stimulus; their mental capacities degrade rapidly and that is just one type of stimulus.
Another problem with your analysis is that you expect the AI to do something that humans cannot do: cite sources without an external reference. Go ahead right now and from memory cite some source for something you know. Do not Google search, just remember where you got that knowledge. Now who is the one that cannot cite sources? The way we cite sources generally requires access to the source at that moment. Current LLMs do not have that by design. Once again, this is a gripe with implementation of a very new technology.
The main problem I have with so many of these “AI isn’t really able to…” arguments is that no one is offering a rigorous definition of knowledge, understanding, introspection, etc in a way that can be measured and tested. Further, we just assume that humans are able to do all these things without any tests to see if we can. Don’t even get me started on the free will vs illusory free will debate that remains unsettled after centuries. But the crux of many of these arguments is the assumption that humans can do it and are somehow uniquely able to do it. We had these same debates about levels of intelligence in animals long ago, and we found that there really isn’t any intelligent capability that is uniquely human.
We not only have to stop ignoring the problem, we need to be absolutely clear about what the problem is.
LLMs don’t hallucinate wrong answers. They hallucinate all answers. Some of those answers will happen to be right.
If this sounds like nitpicking or quibbling over verbiage, it’s not. This is really, really important to understand. LLMs exist within a hallucinatory false reality. They do not have any comprehension of the truth or untruth of what they are saying, and this means that when they say things that are true, they do not understand why those things are true.
That is the part that’s crucial to understand. A really simple test of this problem is to ask ChatGPT to back up an answer with sources. It fundamentally cannot do it, because it has no ability to actually comprehend and correlate factual information in that way. This means, for example, that AI is incapable of assessing the potential veracity of the information it gives you. A human can say “That’s a little outside of my area of expertise,” but an LLM cannot. It can only be coded with hard blocks in response to certain keywords to cut it from answering and insert a stock response.
This distinction, that AI is always hallucinating, is important because of stuff like this:
That is some fucking toxic shit right there. Treating the fallibility of LLMs as analogous to the fallibility of humans is a huge, huge false equivalence. Humans can be wrong, but we’re wrong in ways that allow us the capacity to grow and learn. Even when we are wrong about things, we can often learn from how we are wrong. There’s a structure to how humans learn and process information that allows us to interrogate our failures and adjust for them.
When an LLM is wrong, we just have to force it to keep rolling the dice until it’s right. It cannot explain its reasoning. It cannot provide proof of work. I work in a field where I often have to direct the efforts of people who know more about specific subjects than I do, and part of how you do that is you get people to explain their reasoning, and you go back and forth testing propositions and arguments with them. You say “I want this, what are the specific challenges involved in doing it?” They tell you it’s really hard, you ask them why. They break things down for you, and together you find solutions. With an LLM, if you ask it why something works the way it does, it will commit to the bit and proceed to hallucinate false facts and false premises to support its false answer, because it’s not operating in the same reality you are, nor does it have any conception of reality in the first place.
Some researchers compared the results of questions between chat gpt 3 and 4. One of the questions was about stacking items in a stable way. Chat gpt 3 just, in line with what you are saying about “without understanding”, listed the items saying to place them one on top of each other. No way it would have worked.
Chat gpt 4, however, said that you should put the book down first, put the eggs in a 3 x 3 grid on top of the book, trap them in a way with a laptop so they don’t roll around, and then put the bottle on top of the laptop standing up, and then balance the nail on the top of it…even noting you have to put the flat end of the nail down. This sounds a lot like understanding to me and not just rolling the dice hoping to be correct.
Yes, AI confidently gets stuff wrong. But let’s all note that there is a whole subreddit dedicated to people being confidently wrong. One doesn’t need to go any further than Lemmy to see people confidently claiming to know the truth about shit they should know is outside of their actual knowledge. We’re all guilty of this. Including refusing to learn when we are wrong. Additionally, the argument that they can’t learn doesn’t make sense because models have definitely become better.
Now I’m not saying ai is conscious, I really don’t know, but all of your shortcomings you’ve listed humans are guilty of too. So to use it as examples as to why it’s always just a hallucination, or that our thoughts are not, doesn’t seem to hold much water to me.
They have to be either trained with new data or their internal structure has to be improved. It’s an offline process, meaning they don’t learn through chat sessions we have with them (if you open a new session it will have forgotten what you told it in a previous session), and they can’t learn through any kind of self-directed research process like a human can.
LLMs are sophisticated word generators. They don’t think or understand in any way, full stop. This is really important to understand about them.
You are just wrong
I think where you are going wrong here is assuming that our internal perception is not also a hallucination by your definition. It absolutely is. But our minds are embodied, thus we are able check these hallucinations against some outside stimulus. Your gripe that current LLMs are unable to do that is really a criticism of the current implementations of AI, which are trained on some data, frozen, then restricted from further learning by design. Imagine if your mind was removed from all stimulus and then tested. That is what current LLMs are, and I doubt we could expect a human mind to behave much better in such a scenario. Just look at what happens to people cut off from social stimulus; their mental capacities degrade rapidly and that is just one type of stimulus.
Another problem with your analysis is that you expect the AI to do something that humans cannot do: cite sources without an external reference. Go ahead right now and from memory cite some source for something you know. Do not Google search, just remember where you got that knowledge. Now who is the one that cannot cite sources? The way we cite sources generally requires access to the source at that moment. Current LLMs do not have that by design. Once again, this is a gripe with implementation of a very new technology.
The main problem I have with so many of these “AI isn’t really able to…” arguments is that no one is offering a rigorous definition of knowledge, understanding, introspection, etc in a way that can be measured and tested. Further, we just assume that humans are able to do all these things without any tests to see if we can. Don’t even get me started on the free will vs illusory free will debate that remains unsettled after centuries. But the crux of many of these arguments is the assumption that humans can do it and are somehow uniquely able to do it. We had these same debates about levels of intelligence in animals long ago, and we found that there really isn’t any intelligent capability that is uniquely human.