Rerun: Machine Learning and Catastrophic Forgetting

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:02):
Welcome to tech Stuff, a production from iHeartRadio. Hey there,
and welcome to tech Stuff. I'm your host, Jonathan Strickland.
I'm an executive producer with iHeart Podcasts. And how the
tech are you well. I just got back from celebrating
my birthday. Thank y'all for all of you who are

(00:25):
wishing me a happy birthday. And here in the United States,
we're about to have our national holiday celebrating the fourth
of July. I realize Fourth of July happens everywhere, not
just in the US, but we celebrate it here in
the US, and as such, there's very limited time to
get everything done, and I really wasn't able to pull

(00:45):
an episode together in time, and I apologize for that,
but I thought I would bring an older episode to
y'all so that we can still have an episode to
listen to today. And typically I would have one of my
Fireworks episodes play on this day, because Fireworks has a
very close association with the Fourth of July here in

(01:06):
the United States. But I've done that for several years
in a row, and I've thought it might be nice
to have a break from Fireworks instead. I thought I
would focus on something that continues to be a very
important topic in tech, and that is artificial intelligence. And
AI is incredibly impressive, but there are also lots of

(01:26):
challenges with AI, and those are ranging from the technological
side to the social side right and how we implement AI.
One thing I thought that we don't really get to
talk about very much is the concept of forgetting with AI.
We have a lot of generative AI out there that

(01:46):
is drawing upon huge resources of information, but AI can
also quote unquote forget. So this episode originally published on
July thirty first of twenty twenty three. It is called
Machine Learning and Catastrophic Forgetting. And I think it's a
useful thing to reflect upon as we see more and

(02:07):
more headlines about tech companies and their investment increasingly astronomical
investment in artificial intelligence. I hope you enjoy so. Over
this past weekend, I was listening to the podcast The
Skeptics Guide to the Universe, which I have no connection to.

(02:28):
I just listened to it, and it included a section
on AI that referenced something I don't think I had
heard of before, which is really talking more about my
oversight than anything else. Maybe I did hear about it
but then I forgot about it, you know, catastrophically. So
the thing they talked about was catastrophic forgetting in artificial intelligence,

(02:52):
specifically in machine learning systems built on artificial neural networks. Now,
before we talk about catastrophic forgetting, which as I mentioned,
is related to neural networks and machine learning, we really
need to do a quick reminder, not a quick reminder.
We need to do a full reminder on how all
this works. And that's going to require us to do
a whole lot of remembering. Not a catastrophic amount, but

(03:15):
a lot. So the history of artificial intelligence as a
discipline is one of intense and important debates in fields
like computer science. Now, I have often talked about how
AI can be seen as the convergence of several other
disciplines into its own field. And there's more than one
way to approach the challenge of artificial intelligence. And in

(03:40):
the history of AI, we actually saw that play out,
and some would argue the way it played out means
that we're actually just now playing catch up. So different
schools of thought pushed these different approaches forward as this
should be the prevailing methodology we use to develop artificial intelligence.

(04:02):
This is important because the development of AI does not
exist in a vacuum, right. It exists in our real world.
Research requires funding, and when you've got different sides arguing
that their approach to artificial intelligence is superior and that
the alternatives are not just inferior, but potentially limited to

(04:25):
the point of being useless, well you've got a metaphorical
wrestling match going on. The winner takes home the big
prize of getting funding for their research, and the loser
has to scrabble for whatever they can find, and often
they will see their work languish as a result. By
the way, this is why I often bring stuff up

(04:46):
in this podcast that is outside the realm of tech.
I've received a lot of messages over the years from
folks saying that I should leave out stuff like money
or politics. Politics is the big one. But to me,
that doesn't make sense because tech exists within our world,
a world that is largely shaped by money and politics.

(05:09):
I don't think we can separate the tech from all
of that because I believe that if you were to
somehow magically remove those influences, If somehow money and politics
never played a part in the development of technology, our
tech would look very different from what it does today.
Not necessarily better or worse, but different. I mean, think

(05:29):
about Thomas Edison. He was very much driven by financial success,
like his work in tech was really mostly about making
lots of money. And without the making lots of money part,
you don't really have his drive to really bring together
the brightest minds of his generation and set them to

(05:50):
work on creating incredible technology. So I think we have
to take all these things into consideration. Anyway, that's a
total rabbit trail, and I apology. Let's get back to
our story. It really begins around nineteen forty three when
a pair of researchers at the University of Chicago first
proposed the concept of the basic unit of a neural network.

(06:13):
Those researchers were Warren McCullough and Walter Pets, And in fact,
they demonstrate their idea by showing a simple electrical circuit
the very basis for what would become a neural network.
So their proposal was a system that would use those
simple circuits to mimic the neurons that we have in

(06:33):
our noggins. So our brain consists of a bunch of
these neurons, and you might wonder how much is a bunch. Well,
we're talking about on average, around one hundred billion neurons
in the human brain. These neurons interconnect with each other.
It's not just a one to one, right, You've got
these interconnections between all these different neurons, not with every

(06:55):
neuron connected to every other neuron, but lots of interconnections.
And if we're looking at just the connections, you would
count more than one hundred trillion of them in the
typical human brain. And these connections in our brains make
up neural circuits. Those circuits light up, and that represents
us doing lots of different stuff, from experiencing the world

(07:16):
around us so perception to thinking about a past memory.
You know that typically is like recreating the same pathway
over and over, and sometimes we don't recreate it exactly correctly,
and our memory ends up not being a perfect representation
of the thing that we actually experienced. This is why
things like eyewitness testimony is not always very reliable, because

(07:39):
our memories aren't infallible. They can trick us and we
can have all those pathways light up. When we learn
a new skill, we start forming new pathways, and then
as we practice this skill, we start to reinforce those pathways.
So McCulla and Pitts propose that we create machines capable
of doing essentially a similar thing that our brains do,

(08:03):
so kind of a neuromimicry, not exactly one to one
the way our brains work, but inspired by the way
our brains work. Now, we would be limited by what
the technology of the day would be able to do,
because there's no feasible way we could create a massive

(08:24):
electrical system with one hundred billion individual simple circuits with
more than one hundred trillion connections between them. That would
be beyond our capability. It would be beyond our resources.
We could, however, create systems that used interconnected circuits to
process information and to teach such a system to do

(08:45):
specific tasks. Now, in nineteen forty nine, Donald Hebb wrote
a book about biological neurons, and he titled this book
the Organization of Behavior and suggested neural pathways get stronger
with additional use, kind of like you know, if you
exercise your muscles, you build strength over time, while so

(09:06):
is the same with neural pathways, and if you don't
use those muscles, well, then your muscles get weaker. Well,
same with neural pathways. If you end up learning a skill,
but then over a great amount of time you no
longer practice that skill, you're going to lose some of
your ability, maybe not all of it, but at least

(09:27):
some of it. And you have to you know, like
I think about wrestlers who come back from from retirement,
professional wrestlers, they call it ring rust. You got to
knock off the ring rust and get back into step
and kind of get back into your groove. And it
takes a little time. Typically sometimes you know, you can
get back into the game faster than others, but you

(09:48):
get the idea. And also heb ended up proposing the
concept of cells that fire together wire together, meaning that
neurons that fire at the same time end up strengthening
faster than other neurons do. So when you get into
that system, you can actually reinforce those pathways. And for

(10:14):
AI this would be really important. And it wasn't very
long after Donald Habb had published this work that researchers
in the field of AI tried to apply that concept
that philosophy to computer science. By the mid nineteen fifties,
the burgeoning computer science lab and AI lab at MIT
was building out neural networks based on Hebb's ideas. Meanwhile,

(10:38):
another computer scientist named Frank Rosenblatt was looking at primitive
neural systems and he started with flies like house flies.
He wanted to explore systems that were involved when a
fly would quickly move away after detecting a possible threat,
like instantly, or at least appear to us to instantly

(11:01):
react to something. So, for example, a fly swatter coming
at it, like you might be moving the fly swater
very quickly, and yet the fly is able to move
super fast with no perceivable delay. Right, we know that
we have a delay from when we perceive something to
when we can act on something. Like if you've ever
been in a fender bender in a car accident, you

(11:23):
know that that there's a delay between when you see
the issue when you can hit the brake, and that
can lead to accidents. Well, with flies, that delay seems
to be super super small. So Rosenblatt was really interested
in exploring the neurological reasons for that. How can that happen?
It has to be really simple, right, There has to

(11:43):
be a simple and more or less direct pathway that
exists to allow a fly to react to detecting a
potential threat like that, and if you could replicate that
with electronics, you could have a very simple but potentially
powerful artificial intelligence system. So he came up with this

(12:07):
system that would be based off that very simple direct
pathway that you would see in something like a fly,
and he called it the perceptron. So he went back
to the simple circuit design that was proposed by Pitts
and McCullough and he built out the Mark one perceptron
or perceptron. I guess I should say, so let's talk
about a perceptron, like not big P, but a little

(12:29):
P perceptron. This is probably what we would call a
neural node in a modern neural network. So the purpose
of the perceptron was to accept inputs and produce an
output based on some threshold, Like if the inputs meet
a certain threshold, one output would be produced. If they
failed to do so, a different output would be produced.

(12:50):
The inputs, in turn would be assigned weights, which would
factor into the output the perceptron would generate. So when
we're talking weights, I mean weights as in like how
heavy something is or in this case, how much impact
that thing has, So we're talking about how much impact

(13:12):
one input has relative to other inputs. Let me use
a really mundane human example to kind of explain what
this means. Let's say that your friend asks you to
go see a movie with them, and it's going to
be playing tonight at nine pm. But you've had a
really busy day and you might not be able to
even eat dinner until around nine pm. And if you

(13:34):
go see this movie, it might mean having to skip
dinner or to try and eat something really fast and
unhealthy before you go to the movie. What's more, you
got a really big day tomorrow and you feel like
you really need to be well rested for it. However,
at the same time, you haven't seen this friend in ages,
and you really like this person and you've wanted to

(13:55):
hang with them for a really long time. Plus the
movie they're suggesting is one you've really wanted to see
and you haven't gone yet. Well, you would likely assign
at least unconsciously weights to each of these factors before
you make your decision. You know, if getting some dinner
without having to rush, and also to be really well
rested for tomorrow are really important to you, you'll probably

(14:18):
reluctantly decline the offer. But if you really crave some
time with your friend and you really want to see
that movie before all the spoilers come out on Facebook
or whatever, maybe you'll say yes. Your decision depends upon
the weights you assign those factors, those inputs, even if
you don't consciously think about it that way. Well, the

(14:38):
Perceptron system worked in a similar way, produced outputs by
taking the inputs into consideration, including each input's weight. Moreover,
the more you submitted inputs, the more the system would
quote unquote learn how to weight each of those inputs,
all with the goal of bringing the actual output that
the process or you know, generates closer to the one

(15:00):
you want it to generate. Okay, I just said a
lot there. We've got some more to get through. But
before we get to that, let's take a quick break,
all right. Before the break, we were talking about inputs

(15:21):
and weights and the idea of getting an output that
is close to what you want the system to do.
That's not a guarantee, right, The system could generate an
output that's quote unquote wrong, you know, depending on whatever
task you've set this machine learning system to learn, and

(15:41):
that gets a bit conceptual. So let's talk about a
simple example that I love to use. If you've been
listening to texta for a while, you've heard this before,
and that's talking about pictures of cats. Because cats ruled
the Internet. I don't know if they still do. They
won't talk to me, so just knock things off shelves. Anyway.
If your goal is to tea each a computer system

(16:01):
to differentiate photos that include a cat from photos that
do not include a cat, well, you would need to
train the system, and part of that includes feeding the
system a whole bunch of photographs. Some of those would
have cats in them, some would not, and chances are

(16:22):
the system would misidentify photos. Maybe a significant number of
those photos. You would probably have false positives where the
system thinks there's a cat there and there's not, and
false negatives where it doesn't think there's a cat there
but there is. At that point, your goal is to
try and teach the system to close the gap between
the actual results it produces and what you want it

(16:44):
to produce. In some systems, that means you might have
to go in manually to adjust the input weights to
increase the weight of one input versus another in an
effort to cut down on mistakes. So the perceptron was interesting,
but it was very limited in complexity. It was essentially
a single layer where you'd feed a bunch of inputs

(17:05):
in and you would get an output. So it was
suitable for a subset of computational challenges, but anything beyond
that was well beyond its own reach as a single
layer network. By the late nineteen fifties, other researchers had
created new neural networks that were multi layered. So a
node or neuron didn't just accept inputs, it would generate

(17:28):
outputs that then would become inputs for another layer down.
So instead of just having one layer of nodes, you
would have multiple layers of nodes. Typically you would have
one at the quote unquote top of the network, and
you would have outputs at the bottom, and the ones
in between would be often referred to as hidden layers,

(17:48):
and who knows how many there would be. So anyway
you would feed data to the system, the initial nodes
would generate information as outputs that would become inputs for
the next layer down, which would then continue the process
and so on and so forth until you get to
the output. So now you had artificial neural networks that

(18:08):
could tackle more complex challenges, and you would have multiple
steps in the process. Didn't necessarily mean they were automatically
better than the perceptron, was just that they were able
to tackle more complicated tasks. What followed is something that
will probably sound really familiar to you if you ever

(18:30):
follow technology or fads, the hype around machine learning and
artificial intelligence, and keep in mind this is like the
nineteen sixties. It grew beyond the technology's actual capabilities. At
that time. People started to project what this technology would
be able to do, and they did so thinking it
was going to be in a very short turnaround, like

(18:53):
we're right on the very precipice of a monstrous breakthrough
that will bring the science fiction future into the present.
So when it was realized that we weren't at that, like,
that's not how progress typically works. It's usually much more
gradual and humble than that, well, then enthusiasm around AI

(19:16):
began to take a hit. And as I mentioned already,
a big part of AI research really comes down to funding,
and it gets really challenging to secure funding when public
opinion dims on a technology. We've seen this happen lots
of times, right, like three D television was a fad
that was pushed. Now, granted, that one, you could argue

(19:37):
was more of an example of manufacturing companies that make
televisions trying to push a technology on consumers and the
consumers just weren't interested. You could argue that was the
case there. But virtual reality in the nineteen nineties definitely
followed this pathway. There was this excitement around virtual reality.
Then that excitement faded to almost nothing when people realized

(19:59):
that the actual state of the art of the technology
was far below where they expected it to be. And
suddenly people who are working in VR couldn't get funding
for their work and they kind of had to scrounge
around in order to keep the development going at all.
And then eventually we would see that come back around again.

(20:20):
You could argue that NFTs recently went through this too,
where the hype went well beyond what NFTs could actually do.
I've been really down on NFTs in general. I do
think that there are potential legitimate uses for NFTs, but
I think the early examples were frivolous and almost solely

(20:43):
centered around speculation, as in like financial speculation and as
a result, there was nothing for it to do other
than to create a bubble that would ultimately burst, which
is what happened. And maybe NFTs will recover from that
and become something that's more fundamentally useful in the Internet
in the future or in digital commerce in the future.

(21:06):
But it's going to have to get over the catastrophe
that happened when the rug was pulled out from underneath
n FTS. And that was all predictable and preventable. But
like I've said before, like I've lifted the joke from
Peter Cook, we've learned from our mistakes. We can repeat
them almost exactly. Anyway, This same sort of hype cycle

(21:31):
activity happened with neural networks and machine learning in the
nineteen sixties. Then enter Marvin Minsky and Seymour Pappart of
MIT's AI lab. They were leading that lab at the time.
In nineteen sixty nine, they co authored a book titled Perceptrons.
They were actually critical of that artificial neural network approach

(21:55):
to AI and machine learning. They were concerned that the
limitations of the technology meant that you would need an
unrealistically huge system of artificial neurons. Perhaps then using that
system to compute an infinite number of variations of the
same process or task if you wanted to train the
weights so that they were of the optimal value. So,

(22:18):
in other words, they thought, it's too impractical and it's
going to take too much compute time, and you're never
going to achieve the result you want. You're never going
to get to that most perfect system. And they believed
it just had fundamental inescapable flaws. They had different systems
in mind. Now Minski and Separate tried to push their

(22:42):
systems forward, and I could do a full episode about
them too, and their ideas were not bad. They were different.
It was a different approach. But this also meant that
researchers who had been pushing the development of our artificial
neural networks felt forced to move on to different projects
because financial support for anything connected to the concept of

(23:03):
neural networks effectively disappeared, right like funding just dropped for that.
Because here you had these experts in computer science saying, yeah,
this approach, while interesting, has already hit an insurmountable obstacle
and it's not going to go any further. It's gone
as far as it can go. And so a lot

(23:23):
of computer scientists blamed Minsky and Separate for essentially demolishing
funding for neural networks for more than a decade, and
in fact, this would become an era that retrospectively, computer
scientists would reference as the AI Winter got all Game
of Thrones up in here. Now. In nineteen eighty two,

(23:45):
there was a hint of spring thawing out that AI
Winter researchers in Japan were starting to resurrect work on
neural network projects, and meanwhile, a scientist named John Hopfield
submitted a research paper to the National Academy of Sciences
that brought neural networks back into discussion here in the

(24:05):
United States. And because Japan was actively investing in developing
that technology, institutions in the United States began to open
up the purse strings a bit because there was a
concern that if there were something to this artificial neural
network concept, if in fact those obstacles weren't insurmountable, as

(24:25):
min Skin Separate had suggested, the US could potentially fall
behind another country because it would fail to fund its development. So,
in a desire not to have Japan take the ball
and run with it, the United States began to invest
again in artificial neural network research and development. In the
mid nineteen eighties, computer scientists essentially rediscovered the usefulness of

(24:51):
a process called back propagation. And I've already talked about
nodes and weights and stuff, but this is going to
require a little bit more explanation to under stand what
back propagation is all about. So let's kind of try
to visualize a neural network. So you've got your input nodes.
Just think of a bunch of circles. If you were
drawing it from top to bottom, this would be your

(25:12):
top layer. This is like the funnels where you're going
to feed data into the system. Now you've got a
whole bunch of these at the top and they can
accept the data that you're feeding in. They process that data,
and then based upon some operation, they will then send
an output to a node one layer down. So there's

(25:35):
lots of other nodes in the layers below, or maybe
not as many as you have initial layers. You might
actually have fewer, and the layers above will send to
you know, data to a specific node depending upon what
the outcome is. Whatever the output is, so these nodes
accept the input. These inputs have a bias and a

(25:58):
weight to them, and this is one of the hidden layers.
They will then create an output and send that on
to nodes another layer down. So this goes on until
you get to your output layer, where you get your
final result, and then you can determine whether or not
the final result matches what you were hoping for. So

(26:18):
did your system properly identify which photos do and don't
have cats in them? Now, as I mentioned earlier, you
typically get results that aren't perfect, but we want to
train the system to improve with every test. Back propagation
is one way to do this. So with that propagation,
you actually start with the final output. You've already done

(26:40):
a test run, right, and you've got your output, and
maybe your test has five possible final outcomes, but only
one of those is the outcome you actually want. Okay,
we'll say it's outcome number one. We're saying I want
this system to more often than not come to the
conclusion that's outcome number one. But you run your test.

(27:02):
It's got you one thousand little tasks in it, and
you run your test. You find out that it only
arrives at outcome number one five percent of the time,
which is actually worse than random chance. Right, it should
be twenty percent for random chance, but it's only getting
there five percent of the time. Something is going really

(27:22):
wrong with your system for it to mistakenly go to
one of the other options and very rarely go to
the correct one. So let's say you also noticed the
outcome number three. It goes to that one forty percent
of the time. So it's making this mistake forty percent
of the time and only getting it right five percent
of the time. So things are seriously out of whack.

(27:43):
You need to find which connections which would involve the
biases and the weights that are within your system that
are leading it to mistakenly arrive at the wrong outcome,
so frequently you want to reduce those factors, and simultaneously
you need to boost the ones that lead the system

(28:03):
to arrive at outcome number one, because that's the answer
you actually want the system to get to. All Right,
I've been droning on for a bit. Let's take another
quick break. When we come back, I'll finish up explaining
this and then we'll move on to catastrophic forgetting. Okay,

(28:28):
so we were talking about how you are looking at
a system that is coming to the wrong conclusion ninety
five percent of the time. It is a broken system.
You have to then figure out what factors are causing
this to happen, and they are numerous, right, They extend
all the way up to the very top of your

(28:49):
neural network, the other end where the input comes in.
But you can't just change everything all at once. You've
got to figure this out systematically, and that's what backpropagation
is really all about. Which links one layer up from
the output have the greatest impact on the outcome. Right,
changing everything would be tedious, it would be impractical. You

(29:10):
might even make things worse. Some of these neural networks
are confoundingly complicated, so it's not really a feasible solution.
So instead you look at the connections that are having
the biggest impact on your outcome. So you want things
where if you make a small change in either the
bias or the weight, or maybe both, you'll see a

(29:31):
larger end effect on the outcome. All the connections are
arguably important, but some are more important than others. Backpropagation
works backwards from the result toward the other end of
the network to tweak those connections. It boosts ones that
lead to the correct or desired response, and it reduces
the values of those that lead to incorrect or undesired responses.

(29:53):
If we were to think of this like the classic
example and chaos theory, this could potentially involve us studying
hurricane as it hits land and tracing its history back
as it moved through the ocean, and we would eventually
arrive at the point where it was a tropical storm,
and then we would go further back and see the
factors that led to the creation of that storm. And

(30:15):
maybe if we tracked it all the way back, we
would even find that one of a billion factors that
made the storm was in fact, a butterfly was flapping
its wings on the other side of the world and
that contributed to it. Maybe we find out that butterfly
flap of its wings had an impact, but it was negligible,
and that if the butterfly hadn't flapped its wings, the
hurricane still would have happened. That would be an example

(30:36):
of well, we don't bother adjusting the weight of the
of the impact of that butterfly flapping its wings because
it doesn't matter for the end result. But what if
we were to discover that that butterfly flap of its
wings is the only reason the hurricane happened that, or
at least was the primary reason that all the other
factors pale in comparison. Well, then we'd want to make

(30:59):
sure we boost the weight of that input, because clearly
that butterfly is fundamental for hurricanes. I think hurricanes are
really dangerous, and I would ask butterflies to kind of chill,
all right. I mean, I don't want butterflies to go away,
just you know, maybe stop flapping so much. Anyway, the

(31:20):
formula for backpropagation gets into some calculus that is well
beyond my knowledge and skill. So rather than attempt to
stumble my way through an explanation that I don't actually understand,
I think it's best to leave the concept at the
high level that I have described right now. So just
know that it gets way more granular than what I've
talked about. But essentially, you're looking at those factors that

(31:44):
led to the ultimate decision and saying which ones of
these had the greatest impact, and how can I tweak
them so that I can shape the outcome to one
I wanted. If we were thinking about that example I
gave about whether or not you go to the movies.
Maybe in present day you starts thinking about past experiences

(32:06):
where you made a decision to go out when you
had a big day in the following day, and how
that impacted you, perhaps negatively. Maybe you're like, man, I
should have gotten a promotion by now, and then you think, well,
I do go to the movies an awful lot. You
might say, I need to adjust some of the factors
that affect my decision making process and perhaps prioritize my career.

(32:28):
Or if you've decided that late stage capitalism is terrible
evil and that you're going to try and live a
hedonistic lifestyle of a wandering soul, maybe you say, I'm
going to go and see my movie with my friend,
and yeah, that's just how it is, because that's the
most important thing to me. You only go around this
crazy world once. After all, I'm not telling you which

(32:50):
way to go. I'm still finding my own way. But yeah,
back propagation would be how you would go back and say,
all right, well, because I don't like the outcome that happened,
I need to change the way. These factors weigh in
on the decision making process that goes through the whole system. Now,
the advancements in the science of neural networks proved that

(33:13):
the technology no longer operated under the constraints that concern
Minski and support in the late sixties, so once again
funding found its way to neural network research and development projects.
Now let's finally talk about forgetting and what makes it catastrophic.
So you could, in theory, develop an artificial neural network

(33:34):
and have a library of training data, and the only
thing you ever do with this network is you feed
that same set of training data to that same neural
network over and over in an effort to get performance
as close to perfect as you possibly can. Just you know,
it's kind of like if you have a car and

(33:55):
you're constantly tweaking it so it will perform better, and
maybe you chase one thing and it boosts performance in
one area, but it kind of negatively impacts performance in
another area, so then you got to tweak something else.
You could be doing that with an artificial neural network
forever and just be using the same set of training data.
And all you're trying to do is make a system

(34:16):
that could handle that training data better than any other
system in the world, and that would be interesting, but
it would be useless from a practical standpoint. You could say, like, hey,
you want to see my machine that can sort through
only this collection of photographs and pick out the ones
that have cats in them and the ones that don't.
Pretty pretty darn effectively, but not perfectly. It's not really

(34:37):
an interesting value proposition, right, So more likely you are
eventually going to start feeding lots of different kinds of
data to this neural network. And know, yeah, you train
the network on certain data sets, but your goal is
to feed new sets of data data the system has
never encountered before and rely on the system's ability to
process this information correctly to get the result you want.

(35:01):
And we might even be talking about stuff the human
beings can't easily do, right, But see, the training data
is going to mean that the network will start to
create and reinforce certain pathways, and those pathways will over
time get stronger and stronger, just as we said at
the beginning of this episode. But new data is going
to necessitate new pathways. Sometimes when the system begins to

(35:25):
form these new pathways, it forgets the old pathways. So
it's possible for a neural network to actually get worse
at the task it had previously been trained to do
with the actual training material. In fact, in a true catastrophe,
the system might forget the objective and doesn't recognize what

(35:45):
the desired outcome is meant to be, so the results
can appear random and meaningless. It's as if the system
has developed some form of amnesia. So this is prevalent,
most prevalent anyway, in systems that rely on unguided learning.
With guided learning, you have engineers who are carefully selecting

(36:06):
the data that gets fed into a system. An unguided
system would collect raw data from wherever and attempt to
deliver desired results, and that those are the kinds of
neural networks that are more prone to catastrophic forgetting. But
as I said, machine learning systems tackle new data, maybe

(36:27):
even new tasks, and then you get the risk of
the system forgetting stuff. So I jokingly say, it's kind
of like when I learned something new, it has to
push out something old, like you know, my friend's phone
number or something. Suddenly I can no longer remember it
because I learned some new interesting fact, as if I
have met my capacity for being able to know things.

(36:49):
So learning anything new necessitates having to forget something I
used to know, like gat Ye, because now gat Ye
is just somebody that I used to know. But wait,
there's more. Just as a system can experience catastrophic forgetting,
it can also experience catastrophic remembering. This is when a

(37:10):
system mistakenly believes it is doing one process, a task
it had previously been trained to do, rather than the
one it's actually trying to do. So let's say we've
got an artificial neural network, and originally we taught it
to recognize the photos that have cats in them versus
the ones that don't. But now we have retrained the

(37:32):
same artificial neural network to try and recognize handwritten text.
Except when we feed handwritten text to the system, suddenly
the system believes it's trying to determine where the cats are.
This is something that can happen with machine learning systems too,
and you still get bad results out of it. So
this is a real problem. Now, these are not insurmountable problems.

(37:55):
There are some solutions that are actually intuitive. For example,
any gamer out there knows that it's best to save
your game just before you head into a big boss battle,
just in case things don't go the way you planned well.
With artificial neural networks, it's maybe not a bad idea
to make a copy of a network before you retrain

(38:16):
it to do something new. Then you still have the
backup if things do go pair shape. There are other
approaches to decreasing the risk of catastrophic forgetting or catastrophic remembering.
An article in Applied Mathematics titled Overcoming Catastrophic forgetting a
neural networks describes a system in which the researchers purposefully
slowed down the network's ability to change the weights involved

(38:41):
in important tasks from previous training cycles. So this makes
teaching the system to do new tasks a little more
challenging because it's protecting these weights. It's preventing the system's
ability to be completely plasid, which means the system has

(39:02):
to work around these constraints and still learn how to
do the new task, but in the process it means
it doesn't forget how to do the previous tasks. This
article is interesting because the tasks the researchers actually used
the purposes of training, Like, what were they teaching the
artificial neural network to do well? They were teaching it
how to play atari twenty six hundred games. So they

(39:24):
would start with one game and train the system on
how to play the game. Then they would give the
system a new game with different game mechanics, and the
system would have to learn how to play this new game,
but they wanted to see if it could still remember
how to play the original game. That was kind of
the system they were working on. They were tweaking things

(39:46):
so that the machine learning artificial neural network as a
whole could learn how to play multiple Atari twenty six
hundred games without forgetting how to do the previous ones.
This is a non trivial task. I mean, it takes
a lot of work to see exactly how to preserve
things so that you're not slowing down the learning process
too much, but you're also not inviting the possibility of

(40:08):
catastrophic forgetting. Now, that's just one example of how researchers
are looking to mitigate the problem of catastrophic forgetting in
catastrophic remembering. There are other methods as well, and maybe
I'll do another episode where I'll go into more detail
on some of those. They do get pretty complicated, and
in fact, eventually Rerilli and I even eventually pretty early

(40:31):
on I hit my limit for as far as I
can understand the actual mechanics of the system. So rather
than you know, try and punch above my weight, I
think it's best to kind of be a little more general,
but just to have that understanding to kind of get
a better appreciation of some of the challenges relating to

(40:53):
artificial intelligence in general and machine learning in particular. And again,
like this machine learning issue, you it's really a bigger
problem with more sophisticated systems that are meant to do
unsupervised and unguided learning, right, those are the ones that
are going to be more prone to these issues. If
we're talking about supervised and guided learning, where engineers are

(41:18):
being very careful with the data being fed to a system,
it's less likely to happen. But the whole promise, or
at least the you know, not the promise of the
technology itself, but the promise of the people who are
funding it, is that this technology is going to reach
a point where it's able to learn on its own
and be able to do things better than people can do,

(41:41):
to free us up to doing, you know, stuff we
want to do instead of stuff we have to do.
That's like the science fiction dream version of AI. As
we all know, getting there is much more painful. It's
not like a simple process of Hey, we've made everything
easy to do now and you don't have to work
all day. You can enjoy your life and pursue your

(42:03):
dreams and develop your hobbies and your interests, and you
can have fulfillment and somehow money isn't important anymore. Like
that seems to be the Star Trek version of the
future that people want it to go in. But as
we have seen, the process of getting there is way
more painful. As you know, people face a reality of
potentially being out of work because of AI, or maybe

(42:25):
being paid way less to do work because the AI
is doing most of it. These are not that's not
Star Trek feature. That's getting like into Blade Runner future,
So we don't want that one. By the way, the
tears in the rain speech is fantastic, but you do
not want to live in the Blade Runner world. Trust me.

(42:47):
You might not want to live in the Star Trek
world either, because those outfits don't look that comfortable anyway.
That's my little discussion about AI, machine learning and cast
trophic forgetting in castrophic. Remembering this is just one of
the challenges associated with AI and machine learning. I don't
mean to suggest it's the one and only, or even

(43:08):
that it's the most important one, but it is one
that I had not really heard of until I listened
to that Skeptics Guide to the Universe episode over the weekend,
and it was really interesting to dive into the material
and read up about it and to get a better
understanding of what it means and how it works. I
hope you liked that episode from last year, twenty twenty three,

(43:28):
machine Learning and Catastrophic Forgetting. I am working on other
episodes that relate to AI. I also want to do
an episode about companies that claim to be part of
the artificial intelligence space but in fact use little if
any AI technology, because that has become a thing. As
we all know, when there is the combination of huge

(43:51):
amounts of money and low amounts of understanding, you have
the perfect breeding ground for scams and con artists and
that kind of thing. So I do plan on doing
an episode about various startups that claim at some level
to be part of artificial intelligence, but when you really

(44:12):
start to examine them, have little to no connection to
that world. So being on the lookout for that, It's
going to take me some time to do some research
because there's lots of different sources to go through on
that one. But that's what I'm working on for probably
next week. I'm hoping next week. In the meantime, for
those of you here in the United States, I hope
you have a safe Fourth of July celebration. Make sure

(44:37):
that you spend time with friends and loved ones, and
you know, be very careful if you're going to be
around fireworks. Those things are very dangerous for everyone else
out there who's not celebrating a holiday and fourth of July.
I hope you have an excellent Fourth of July wherever
you are, and that whatever you enjoy doing, you get
to do a lot of it on the Fourth of July.

(45:00):
As long as it's you know, not hurting yourself or
other people. That's it for me. I will talk to
you again really soon. Tech Stuff is an iHeartRadio production.
For more podcasts from iHeartRadio, visit the iHeartRadio app, Apple Podcasts,

(45:21):
or wherever you listen to your favorite shows.

All Episodes

Episode Transcript

TechStuff News

Follow Us On

Hosts And Creators

Oz Woloshyn

Karah Preiss

Show Links

Popular Podcasts

Las Culturistas with Matt Rogers and Bowen Yang

40s and Free Agents: NFL Draft Season

Crime Junkie

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Rerun: Machine Learning and Catastrophic Forgetting