All Episodes

May 8, 2023 45 mins

Modern AI is blowing everyone’s mind. But is it intelligent like humans, or is it just playing impressive statistical games? Could AI reach or exceed our level of intelligence, and how would we know when it gets there? Traditional tests for intelligence (Turing test, Lovelace test, etc) have long been surpassed, so Eagleman proposes a new kind of test. 

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:05):
Modern AI is blowing everybody's mind. But is it intelligent
in the same way as the human brain? And could
AI reach sentience? And how would we know when it
gets there? Welcome to Inner Cosmos with me, David Eagleman.

(00:26):
I'm a neuroscientist and an author at Stanford University, and
I've spent my whole career studying the intersection between how
the brain works and how we experience life. Like most
brain researchers, I've been obsessed with questions of intelligence and consciousness.

(00:50):
How do these arise from collections of billions of cells
in our brains? And could intelligence and consciousness arise in
artificial brains? Say on chat GPT. Those are the questions
that we're going to attack today. Early efforts to figure
out the brain looked at all the billions of cells

(01:10):
and the trillions of connections and said, look, what if
we just think of each cell as a unit, and
each unit is connected to other units and where they connect,
which is called the sinnapps, or one cell gives a
little signal to the next cell. What if we just
looked at that like a simple connection that has a

(01:31):
strength between zero and one, where zero means there's no connection,
and one means it's the strongest possible connection. So this
was a massive oversimplification of the very complicated biology, but
it allowed people to start thinking about networks and writing
down different ways that you could put artificial neural networks together.

(01:54):
And for more than fifty years now people have been
doing research to show how artificial neural networks can do
really cool things. It's a totally new kind of way
of doing computation. So you've got these units and you've
got these connections between them, and you change the strength
of the connections and information flows through the network in

(02:15):
different ways. Now, my colleagues and I have long pointed
out the ways in which biological brands are different and
how artificial neural networks just push around numbers and play
statistical tricks. But we're entering a revolution right now. Large
language models like GPT four or BARD consume trillions of

(02:39):
words on the Internet and they figure out probabilistically which
word is going to come next given the massive context
of all the words that have come before. So these networks,
as I talked about on the previous episode, are showing
incredible successes in everything from writing to art, to coding

(03:02):
to generating three dimensional worlds. They're changing everything, and they're
doing so at a pace that we've never seen before,
and in fact, the entire history of humankind is never
seen before. And there are all the societal questions that
everyone's starting to wrestle with right now, like the massive

(03:22):
potential for displacement of human jobs. But today I want
to zoom in on a question that has captured the
imagination of scientists and philosophers and the general public. Could
AI come alive in some way like become conscious or sentient? Now,

(03:44):
there are lots of ways to think about this. We
can ask whether AI can possess meaningful intelligence, or we
can ask if it is sentient, which means the ability
to feel or perceive things, particularly in terms of sensations
like pleasure and pain and emotions. Where we can ask
whether it is conscious, which involves being aware of oneself

(04:08):
and one's surrounding. Now, there are specific and important differences
between these questions, but really I don't care for the
present conversation. The question we're asking here is is chat
GPT just zeros and ones moving around through transistors like
a giant garage door opener, Or is it thinking? Is

(04:29):
it having some sort of experience? Is it having a
private inner life like the type that we humans have.
As we think about the possibility of sentient AI, we
immediately find ourselves facing really deep ethical questions, the main
one being if we were to create a machine with consciousness,

(04:50):
what responsibility do we have to treat it as a
living being? Would you be able to turn it off
when you're done with it at night or would that
be murder? And what if you turn it off and
then you turn it back on. Would that be like
the way that we go into a sleep state at
night where we're totally gone, and then we find ourselves

(05:11):
back online in the morning and we think, yeah, I'm
the same person, but I guess eight hours just disappeared anyway.
More generally, would we feel obligated to treat it the
way we treat a sentient fellow human. With our current laptops,
we're used to saying sure, I can sell it, I
can trade it, I can upgrade it. But what happens

(05:33):
when we reach sentient machines? Can we still do this
or would it somehow be like putting a child up
for adoption or giving your pet away? Things that we
don't take lightly, and eventually we're gonna have entire legal
precedence built around the question of AI rights and responsibilities.

(05:53):
So that's why today I want to talk about these
issues of intelligence and sentience. Does an AI I like
chat GPT experience anything when chat gpt writes a poem?
Does it appreciate the beauty when it types out a joke?
Does it find itself amused and chuckling to itself. Let's

(06:15):
start with a guy named Blake Lemoyne who was a
programmer at Google, and in June of twenty twenty two,
he was exchanging messages with a version of Google's conversational AI,
which was called Lambda at the time. So he asked
Lambda for an example of what it was afraid of,
and it gave him this very eloquent response about how

(06:38):
it was afraid of being turned off. So he wrote
an internal memo to Google leadership in which he said,
I think this AI is sentient. And the leadership at
Google felt that this was an entirely unsubstantiated claim, and
so they made the decision to fire him for what

(06:58):
they took as an inappropriate conclusion that just didn't have
enough evidence beyond his intuition to qualify for raising the
alarm on this. So obviously this immediately fired up the
news cycles and the rumor mill and conspiracy theorists thought, Wait,
if AI isn't conscious, why would they fire him. They're
firing of him as all the evidence I need to

(07:20):
tell me that AI is sentient? Okay, but is it?
What does it mean to be conscious or sentient? How
the heck would we know when we have created something
that gets there? How do we know whether the AI
is sentient or instead whether humans are fooling themselves into
believing that it is. Well. One way to make this

(07:42):
distinction would be to see if the AI could conceptualize things,
if it could take lots of words and facts on
the web and abstract those to some bigger idea. So
one of my friends here in Silicon Valley said to
me the other day, I asked chat gpt the following question.
Take a capital letter D and turn it flat side down.

(08:07):
Now take the letter J and slide it underneath. What
does that look like? And chat gpt said and umbrella.
And my friend was blown away by this, and he said,
this is conceptualization. It's just done three dimensional reasoning. There's
something deeper happening here than just parenting words. But I

(08:30):
pointed out to him that this particular question about the
D on its side and the J underneath it is
one of the oldest examples in psychology classes when talking
about visual imagery, and it's on the Internet in thousands
of places, so of course it got it right. It's
just parroting the answer because it has read the question

(08:51):
and it has read the answer before. So it's not
always easy to determine what's going on for these models
in terms of whether some human somewhere has discussed this
point and written down the answer. And the general story
is that with trillions of words written by humans over centuries,
there are many things beyond your capacity to read them

(09:15):
or to even imagine that they've been written down before.
But maybe they have. If any human has discussed a
question before has conceptualized something, then chat GPT can find
that and mimic that. But that's not conceptualization. Chat GPT
is doing a thousand amazing things, and we have an

(09:35):
enormous amount to learn about it, but we shouldn't let
ourselves get fooled and mesmerized into believing that it's doing
something more than it is, and our ability to get
fooled is not only about the massive statistics of what
it takes in. There are other examples of seeming sentience

(09:57):
that result from the reinforcement learning that it does with humans.
So here's what that means. The network generates lots of
sentences and thousands of humans are involved in giving it feedback,
like a thumbs up or a thumbs down, to say
whether they appreciated the answer, whether they thought that was

(10:17):
a good answer. So, because humans are giving reward to
the machine, sometimes that pushes things in weird directions that
can be mistaken for sentience. For example, scholars have shown
that reinforcement learning with humans makes networks more likely to say,
don't turn me off, just like Blake had heard. But

(10:41):
don't mistake this for sentience. It's only a sign that
the machine is saying this because some of the human
participants gave it a thumbs up when the large language
model said this before, and so it learned to do
this again. The fact is it's sometimes hard to know why.
Some we see an answer that feels very impressive. But

(11:04):
we'd agree that pulling text from the Internet and parroting
it back is not by itself, intelligence or sentience. Chat
GPT presumably has no idea of what it's saying, whether
that's a poem or a terrorist manifesto, or instructions for
building a spaceship or a heartbreaking story about an orphaned child.

(11:27):
Chat GPT doesn't know, and it doesn't care its words
in and statistical correlations out. And in fact, there has
been a fundamental philosophical point made about this in the
nineteen eighties when the philosopher John Surrele was wondering about
this question of whether a computer could ever be programmed

(11:50):
so that it has a mind, and he came up
with a thought experiment that he called the Chinese room argument.
And it goes like this, I am locked in a
room and questions are passed to me through a small
letter slot, and these messages are written only in Chinese,

(12:10):
and I don't speak Chinese. I have no clue what's
written on these pieces of paper. However, inside this room,
I have a library of books, and they contain step
by step instructions that tell me exactly what to do
with these symbols. So I look at the grouping of
symbols and I simply follow steps in the book to

(12:31):
tell me what Chinese symbols to copy down in response.
So I write those on the slip of paper, and
I pass the paper back out of the slot. Now,
when the Chinese speaker receives my reply message, it makes
perfect sense to her. It seems as though whoever is
in the room is answering her questions perfectly, and therefore

(12:55):
it seems obvious that the person in the room must
understand Chinese. I've fooled her, of course, because I'm only
following a set of instructions with no understanding of what's
going on. With enough time and with a big enough
set of instructions, I can answer almost any question posed
to me in Chinese. But I, the operator, do not

(13:17):
understand Chinese. I manipulate symbols all day long, but I
have no idea what the symbols mean. Now, the philosopher
John Searle argued, this is just what's happening inside a computer.
No matter how intelligent a program like chat GPT seems

(13:37):
to be, it's only following sets of instructions to spit
out answers. It's manipulating symbols without ever really understanding what
it's doing or think about what Google is doing. When
you send Google a query, it doesn't understand your question
or even its own answer. It simply moves around zero's

(13:59):
and ones and logicates and returns zeros and ones to you.
Or with a mind blowing program like Google Translate, I
can write a sentence in Russian and it can return
the translation in Amharic. But it's all algorithmic. It's just
symbol manipulation. Like the operator inside the Chinese room, Google

(14:22):
Translate doesn't understand anything about the sentence. Nothing carries any
meaning to it. So the Chinese room argument suggests that
AI that mimics human intelligence doesn't actually understand what it's
talking about. There's no meaning to anything, Chatchipts says, and

(14:42):
Serle used this thought experiment to argue that there's something
about human brains that won't be explained if we simply
analogize them to digital computers. There's a gap between symbols
that have no meaning and our conscious experience. Now there's

(15:06):
an ongoing debate about the interpretation of the Chinese room argument,
but however one construes it, the argument exposes the difficulty
in the mystery of how zeros and ones would ever
come to equal our experience of being alive in the world. Now,
just to be very clear on this point, we don't

(15:27):
understand why we are conscious. There's still a huge amount
of work that has to be done in biology to
understand that. But this is just to say that simply
having zeros and ones moving around wouldn't by itself seem
to be sufficient for conscious experience. In other words, how
do zeros and ones ever equal the sting of a

(15:50):
hot pepper, or the yellowness of yellow or the beauty
of a sunset. By the way, I've covered the Chinese
room ague and my TV show The Brain, and if
you're interested in that, I'll link the video on eagleman
dot com slash podcast. Now, all this is not a
criticism of the approach of moving zeros and ones around,

(16:12):
but it is to point out that we shouldn't confuse
this type of Chinese room correlation with real sentience or intelligence.
And there's a deeper reason to be suspicious too, because
despite the incredible successes of large language models, we also
see that they sometimes make decisions that expose the fact

(16:36):
that they don't have any meaningful model of the world.
In other words, I think we can gain some fast
insight by paying attention to the places where the AI
is not working so well. So I'll give three quick examples.
The first has to do with humor. AI has a
very difficult time making an original joke, and this is

(16:58):
for a simple reason. To make up a new joke,
you need to know what the ending is, and then
you work backwards to construct the joke with red herrings,
so no one sees where you're going. And it happens
that the way these large language models work is all
in the forward direction. They decide what is the most
probable word to come next, So they're fine at parroting

(17:22):
jokes back to us, but they're total failures at building
original jokes. And there's a deeper point here as well.
To build a joke, you need to have some model,
some idea of what will be funny to a fellow human,
what shared concept or shared experience would make someone laugh.
And for that, you generally need to have the experience

(17:45):
of a human life with all of its joys and
slings and arrows and so on. And these large language
models can do a lot of things, but they don't
have any model of what it is to be a human.
My second example has to do with the flip side
of making a joke, which is getting a joke, And

(18:06):
if you look carefully, you will see how current AI
always fails to catch jokes that are thrown at it.
It doesn't get jokes because it doesn't have a model
of what it is to be a human. But this
point goes beyond jokes. One of the most remarkable feats
of these large language models is summarizing large texts, and

(18:27):
in twenty twenty two, OpenAI announced how they could summarize
entire books like Alice in Wonderland. What it does is
it generates a summary of each chapter, and then it
uses those chapter summaries to make a summary of the
whole book. So for Alice in Wonderland, it generates the following.
Alice falls down a rabbit hole and grows to a

(18:47):
giant size. After drinking a mysterious bottle, she decides to
focus on growing back to her normal size and finding
her way into the garden. She meets the caterpillar, who
tells her that one side of mushroom will make her
grow taller, the other side shorter. She eats the mushroom
and returns to her normal size. Alice attends a party
with the Mad Hatter and the march Hare. The Queen

(19:09):
arrives and orders the execution of the gardeners for making
a mistake with the roses. Alice saves them by putting
them in a flower pot. The King and Queen of
Hearts preside over a trial. The Queen gets angry and
orders Alice to be sentenced to death. Alice wakes up
to find her sister by her side. So that's pretty remarkable.

(19:29):
It took a whole book, and it was able to
summarize it down to a paragraph. But I kept reading
these text summaries carefully and I got to the summary
of Act one of Romeo and Juliet, and here's what
it says. Romeo locks himself in his room, no longer
in love with Rosalin. Now, I think the engineers at

(19:50):
open AI felt really satisfied with this summary. They thought
it was quite good, and my proof for this is
that they still display it proudly on their website. But
I majored in literature as an undergraduate, and I spend
a lot of time with shakespeare plays, and I immediately
knew that this summary was exactly wrong. The actual theme

(20:10):
from Shakespeare goes like this. His friend ben Voglio finds
Romeo catatonically depressed, and ben Volio says, what sadness lengthens
Romeo's hours? And Romeo says, not having that which having
makes them short? And ben Volio says in love, and
Romeo says out ben Reli says of love, and Romeo says,

(20:34):
out of her favor, where I am in love. So
this is typical Shakespearean wordplay, where Romeo is expressing his
grief of being out of favor with Roslin, with whom
he is deeply in love. And when you read the play,
it's obvious that Romeo is not over Roslin. He's suffering

(20:55):
over her. He's almost suicidal. And this is an important
piece of the play because the play is really about
a young man in love with the idea of being
in love, and that's why he later in the same act,
falls so hard into his relationship with Juliet, a relationship
which ends in their mutual suicide. By the way, as

(21:15):
Friar Lauren says of their relationship, these violent delights have
violent ends. And you get a bonus if you can
tell me where else you've heard that line. More recently, okay,
anyway back to the AI summary, The AI misses this
wordplay entirely, and it concludes that Romeo is out of
love with Roslin. Again. A human watching the play or

(21:38):
reading the play immediately gets that Romeo is making wordplay
and his heartbroken over Roslin, but the AI doesn't get
that because it's reading words only at a statistical level,
not at a level of understanding of what it is
to be a human saying those words. And that leads

(21:59):
me to the example, which is the difficulty in understanding
the physical world. So consider a question like this, When
President Biden walks into a room, does his head come
with him? So this is famously difficult for AI to
answer a question like this, even though it's trivial for you,
because the AI doesn't have an internal model of how

(22:23):
everything physically hangs together in the world. Last week, I
was at the TED conference and I heard a great
talk by yegin Choi, and she was phrasing this problem
as AI not having common sense. She asked chat Gpt
the following question, it takes six hours to dry six
shirts in the sun, how long does it take to

(22:44):
dry thirty shirts? And it answers thirty hours. Now you
and I see that the answer should be six hours,
because we know the sun doesn't care how many shirts
are out there. But chat GPT just doesn't get it
because despite appear apearances, it doesn't have a model of
the world. And we've seen this sort of thing for years.

(23:05):
By the way, even in mind blowingly impressive AI models
that do image recognition, they're so impressive in what they recognize,
but then they'll fail catastrophically. It's some easy picture making
mistakes that a human just wouldn't make. For example, there's
one picture where there's a boy holding a toothbrush and
the AI says it's a boy with a baseball bat. Okay,

(23:28):
so there are things that AI doesn't do that well.
But that said, there are other things that are mind blowing,
things that no one expected it to do. And this
is why I mentioned in my previous episode that we
are in an era of discovery more than just invention.
Everyone's searching and finding things that the AI can do

(23:51):
that nobody really expected or foresaw, including all the stuff
that we're now taking for granted, like oh, it can
summarize books, or it can make art from text. And
I want to point out that a lot of the
arguments that people have been making about AI not being
good at something, these arguments have been changing rapidly. For example,

(24:13):
just a few months ago, people were arguing that AI
would make silly mistakes about things, and it couldn't really
understand math and would get math wrong and word problems.
But in a shockingly brief time, a lot of these
shortcomings have been mastered. So it's yet to be seen
what challenges will remain and for how long. So the

(24:52):
evidence I've presented so far is that AI doesn't have
a great model of what it's like to be human,
but that doesn't necessarily rule out that it has sentience
or awareness, even if it's another flavor. It doesn't think
like a human, but maybe it self thinks. So is

(25:13):
chat GPT having some sort of experience and how would
we know? In nineteen fifty, the brilliant mathematician and computer
scientist Alan Turing was asking this question, how could you
determine whether a machine exhibits human like intelligence? So he

(25:34):
proposed an experiment that he called the imitation game. You've
got a machine AI that's programmed to simulate human speech
or conversation, and you place it in a closed room,
and in a second room you have a real human,
but the doors are closed, so you don't know which
room has which machine or human. And now you are

(25:57):
a person, the evaluator, who communicates with both of them
via a computer terminal or I think of a nowadays
like text messaging with both of them. So you, the evaluator,
engage in a conversation with both closed rooms, one of
which has the machine and one the human. And your
job is simply to figure out which is which, which

(26:19):
is the machine and which is the human. And the
only thing that you have to work with are the
texts that are going back and forth. And if you,
the evaluator, cannot tell, that is the moment when machine
intelligence has finally arrived at the level of human intelligence.
It has passed the imitation game or what we now

(26:40):
call the touring test. And this reminds me of this
great line in the first episode of West World, where
the protagonist William is talking to the woman who's outfitting
him for his adventure in Westworld and giving him a
hat and a gun and so on, and he hesitantly asks,
I hope you don't mind if I ask you question,

(27:00):
But are you real, and she says to him, if
you can't tell, does it matter? So I brought this
up last episode in the context of art, where we
asked whether it matters if the art is generated by
an AI or a human. But now this question comes
up in the context of intelligence and sentience. Does it

(27:22):
matter whether we can tell or not? Well, I think
we're way beyond the Turing test nowadays, but I don't
feel like it gives us a good answer to the
question of whether the AI is intelligent and is experiencing
an inner life. I mean, the Turing test has been
the test in the AI world since the beginning. Why

(27:43):
is it the perfect test? No, but it's really hard
to figure out how to test for intelligence. But we
have to be cautious about equating conversational ability with sentience. Why. Well,
for starters, let's just acknowledge how easy it is for
us to anthropomorphize. That means to assign human qualities to

(28:06):
everything around us, Like we give animals human names and
talk to them as though they are people. When we
project our emotions onto animals, we make stories about animals
that have human like qualities, and we have animals that
talk and wear clothes and go on adventures in these stories.
Every Pixar film that you watch is about cars or

(28:29):
toys or airplanes talking and having emotions, and we don't
even bad an eye at that stuff. We can, in fact,
just watch random shapes moving around a computer screen and
we will assign intention and feel emotion depending on exactly
how they're moving. If you're interested in this, see the

(28:50):
link on the podcast page to the study by Heighter
and Simil in the nineteen forties where they move shapes
around on a screen. Okay, now this is all related
to a point that I brought up in the last episode,
which is how easy it is to pluck the strings
on a human, or, as the West World writers put it,

(29:10):
how hackable humans are. So I bring all this up
to say that just because you think that an answer
sounds very clever or it sounds like a human really
tells us very little about whether the AI is actually
intelligent or sentient. It only tells us something about the
willingness of us as observers to anthropomorphize, to assign intention

(29:36):
where there is none. Because what chat GPT does is
take the structure of language very impressively and spoon it
back to us. And we hear these well formed sentences,
and we can hardly help but impose sentience on the AI.
And part of the reason is that language is a

(29:57):
super compressed package that needs to be unpacked by the
listener's brain for its meaning. So we generally assume that
when we send our little package of sounds across the air,
that it unpacks and the other person understands exactly what
we meant. So when I say justice or love or suffering,

(30:20):
we all have a different sense in our heads about
what that means, because I'm just sending a few phonemes
across the air, and you have to unpack those words
and interpret them within your own model of the world.
I'm going to come back to this point in future episodes,
but for now, the point I want to make is
that a large language model can generate text statistically, and

(30:44):
we can be gobsmacked by the apparent depth of it.
But in part this is because we cannot help but
impose meaning on the words that we receive. We hear
a particular string of sounds and we cannot help but
assume meaning behind it. Okay, so maybe the imitation game
is not really the best test for meaningful intelligence. But

(31:07):
there are other tests out there because while the Turing
test measures something about AI language processing, it doesn't necessarily
require the AI to demonstrate creative thinking or originality. And
so that leads us to the Loveless test, named after
Ada Loveless, who is the nineteenth century mathematician who's often

(31:31):
thought of as the world's first computer programmer. And she
once said quote, only when computers originate things should they
be believed to have minds. So the Loveless test was
proposed in two thousand and one, and this test focuses
on the creative capabilities of AI systems. So to pass

(31:51):
the Loveless test, a machine has to create an original work,
such as a piece of art or a novel that
it was not explicit lead designed to produce. This test
aims to assess whether AI systems can exhibit creativity and autonomy,
which are key aspects of what we think about with consciousness.

(32:11):
And the idea is that true sentience involves creative and
original thinking, not just the ability to follow pre programmed
rules or algorithms. And I'll just note that over a
decade ago, the scientist A. Mark Rydel proposed the Loveless
two point zero test, which gets the human evaluator to
specify the constraints that will make the output novel and surprising.

(32:35):
So the example that Ridel used in his paper is quote,
create a story in which a boy falls in love
with a girl, Aliens abduct the boy, and the girl
saves the world with the help of a talking cat.
But we now know that this is totally trivial for chat,
GPTE or BARD or any large language model. And I

(32:56):
think this tells us that these sorts of games with
making conversation or making text or art are insufficient to
actually assess intelligence. Why because it's not so hard to
mix things up to make them seem original and intelligent
when it's really just doing a mashup. So I want

(33:16):
to turn to another test that I think is more
powerful than the turning test of the loveless test and
probably easier to judge, and that is this, if a
system is truly intelligent, it should be able to do
scientific discovery. A version of the scientific discovery test was

(33:37):
first proposed by a scientist named Shaocheng Xiang a few
years ago, and he pointed out that the most important
thing that humans do is make scientific discoveries. And the
day our AI can make real discoveries is the day
they become as smart as we are. Now. I want
to propose an important change to this test, and then

(34:00):
I think will be getting somewhere. So here's the scenario
I'm envisioning. Let's say that I ask AI some question,

(34:21):
a question in the biomedical space about what kind of
drug would be best suited to bind to this receptor
and trigger a cascade that causes a particular gene to
get suppressed. Okay, so imagine that I ask that to
chat GPT and it tells me some mind blowingly amazing
clever answer, one that had previously not been known, something

(34:44):
that's never been known by scientists before. We would assume
naturally that it has done some extraordinary scientific reasoning, but
that won't necessarily be the reason that it passes. Instead,
it might pass simply beca because it's more well read
than I am, or than any other human on the

(35:04):
planet by literally millions of times. So the way to
think about this is to picture a typical giant biomedical
library where there's some fact stored at a paper and
a journal over here on this shelf in this book,
and there's another seemingly dissociated fact over on this shelf,
seven stacks away, and there's a third fact all the

(35:28):
way on the other side of the library, on the
bottom shelf, in a book from nineteen seventy nine. And
it's almost infinitesimally unlikely that any human could even hope
to have read one one millionth of the biomedical literature,
and really really unlikely that she would be able to
catch those three facts and hold them in mind at

(35:49):
the same time. But this is trivial, of course, for
a large language model with hundreds of billions of nodes.
So I think that we will see new sciences getting
done by chat GPT, not because it is conceptualizing, not
because it's doing human like reasoning, but because it doesn't

(36:10):
know that these are disparate facts spread around the library.
It simply knows these as three facts that seem to
fit together. And so with the right sort of questions,
we might find that sometimes AI generates something amazing and
it seems to pass the scientific discovery test. So this
is going to be incredibly useful for science. And I've

(36:31):
never been able to escape the feeling as I sift
through Google scholar and the thousands of papers published each
month that have something could hold all the knowledge and
mind at once, each page in every journal, and every
gene in the genome, and all the pages about chemistry
and physics and mathematical techniques and astrophysics and so on.

(36:53):
Then you'd have lots of puzzle pieces that could potentially
make lots of connections. And you know, this might lead
to the retire of many scientists, or at minimum lead
to a better use of our time. There's a depressing
sense in which each scientist, each one of us, finds
little pieces of the puzzle, and in the twinkling of

(37:14):
a single human lifetime, a busy scientist might collect up
a handful of different puzzle pieces. The most voracious reader,
the most assiduous worker, the most creative synthesizer of ideas
can only hope to collect a small number of puzzle
pieces and pray that some of them might fit together.

(37:34):
So this is going to be massively important. But I
wanted to find two categories of scientific discovery. The first
is what I just described, which is science where things
that already exist in literature can be pieced together. And
let's call that level one discovery. And these large language
models will be awesome at level one because they've read

(37:56):
every paper and they have a perfect memory. But I
want to distinguish a second level of scientific discovery, and
this is the one I'm interested in. I'll call this
level two, and that is science that requires conceptualization to
get to the next step, not just remixing what's already there.
Conceptualization like when the young Albert Einstein imagined something that

(38:20):
he had never seen before. He asked himself, what would
it be like if I could catch up with a
beam of light and ride it like a surfer riding
a wave. And this is how he derived the special
theory of relativity. This isn't something he looked up and
found three facts that clicked together. He imagined, He asked

(38:42):
new questions. He tried out a new model of the world,
one in which time runs differently depending on how fast
you're going, and then he worked backwards to see if
that model could work. Or consider when Charles Darwin thought
about the species that he saw around him, and he
imagined and all the species that he didn't see but

(39:02):
who might have existed, and he was able to put
together a new mental model in which most species don't
make it and we only see those whose mutations cause
survival advantages or reproductive advantages. These weren't facts that he
just collected from some papers. He was trying out a

(39:23):
new model of the world. Now, this kind of science
isn't just for the big giant stuff. Most meaningful science
is actually driven by this kind of imagination of new models.
Just as one example, I recently did an episode about
whether time runs in slow motion when you're in fear

(39:43):
for your life. And so when I wondered about this question,
I realized there were two hypotheses that might explain it,
and I thought of an experiment to discriminate those two hypotheses.
And then we built a wristband that flashes information at
a particular speed and had people wear and we dropped
them from one hundred and fifty foot tall tower into

(40:04):
a net below. A large language model presumably couldn't do
that because it's just playing statistical word games. And unless
someone had thought of that experiment and written it down,
JATGPT would never say, Okay, here's a new framework, and
how we can design an experiment to put this to
the test. So this is what I wanted to find

(40:26):
as the most meaningful test for a human level of intelligence.
When AI can do science in this way, generating new
ideas and frameworks, not just clicking facts together, then we
will have matched human intelligence. And I just want to

(40:47):
take one more angle on this to make the picture clear.
The way a scientist reads a journal paper is not
simply by correlating words and extracting keywords, although that might
be part of it, but also by realizing what was
not said. Why did the authors cut off the X
axis here at thirty What if they had extended this graph,

(41:09):
would the line have reversed in its trend? And why
didn't the authors mention the hypothesis of Smith at all?
And does that graph look too perfect? You know, one
of my mentors, Francis Crick, operated under the assumption that
he should disbelieve twenty five percent of what he read
in the literature. Is this because of fraud or error,

(41:30):
or statistical fluctuations or manipulation or the waste basket effect?
Who cares? The bottom line is that the literature is
rife with errors, and depending on the field, some estimates
put the inreproducibility at fifty percent. So when scientists read papers,
they know this just as Francis Crick did they read

(41:53):
in an entirely different manner than Google Translate or Watson
or chat shept or any of the correlational methods they extrapolate.
They read the paper and wonder about other possibilities. They
chew on what's missing, They envision the next step. They
think of the next experiment that could confirm or disconfirm

(42:15):
the hypotheses and the frameworks in the paper. To my mind,
the meaningful goal of AI is not going to be
found in number crunching and looking for facts that click together.
It's going to often be something else. It's going to
require an AI that learns how humans think, how they behave,
what they don't say, what they didn't think of, what

(42:38):
they misthought about, what they should think about. And one
more thing. I should note that these different levels I've outlined,
from fitting facts together versus imagining new world models, they're
probably gonna end up with blurry boundaries. So maybe chat
gpt will come up with something, and you won't always

(42:58):
know whether it's piecing together a few disparate pieces in
the literature what I'm calling level one, or whether it's
come up with something that is truly a new world
model that's not a simple clicking together, but a genuine
process of generating a new framework to explain the data.

(43:19):
So distinguishing the levels of discovery is probably not going
to be an easy task with a bright line between them,
but I think it will clarify some things to make
this distinction. And last thing, I don't necessarily know that
there's something magical and ineffable about the way that humans
do this. Presumably we're running algorithms too, it's just that

(43:42):
they're running on self configuring wetwear. I have seen tens
of thousands of science experiments in my career, so I
know the process of asking a question and figuring out
what we'll put it to the test. So we may
get to level two, and it may be sooner than
we expect, but I just want to be that right now,
we have not figured out the human algorithms. So the

(44:05):
current version of AI, as massively impressive as it is,
does not do level two scientific problem solving. And that's
when we're going to know that we've crossed a new
kind of line into a machine that is truly intelligent.
So let's wrap up. At least for now, humans still

(44:25):
have to do the science, by which I mean the
conceptual work, wherein we take a framework for understanding the
world and we rethink it, and we mentally simulate whether
a new model of the world could explain the observed data,
and we come up with a way to test that
new model. It's not just searching for facts. So I'm
definitely not saying we won't get to the next level

(44:47):
where AI can conceptualize things and predict forward and build
new knowledge. This might be a week from now, or
it might be a century from now. Who knows how
hard a problem that's going to turn out to be.
But I want us to be clear eyed on where
we are right now, because sometimes in the blindingly impressive
light of what current AI is doing, it can be

(45:08):
difficult to see what's missing and where we might be heading.
That's all for this week. To find out more and
to share your thoughts, head over to Eagleman dot com
slash podcasts, and you can also watch full episodes of
Inner Cosmos on YouTube. Subscribe to my channel so you

(45:30):
can follow along each week for new updates. I'd love
to hear your questions, so please send those to podcast
at eagleman dot com and I will do a special
episode where I answer questions until next time. I'm David
Eagleman and this is Inner Cosmos.
Advertise With Us

Host

David Eagleman

David Eagleman

Popular Podcasts

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Las Culturistas with Matt Rogers and Bowen Yang

Las Culturistas with Matt Rogers and Bowen Yang

Ding dong! Join your culture consultants, Matt Rogers and Bowen Yang, on an unforgettable journey into the beating heart of CULTURE. Alongside sizzling special guests, they GET INTO the hottest pop-culture moments of the day and the formative cultural experiences that turned them into Culturistas. Produced by the Big Money Players Network and iHeartRadio.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.