OpenAI's Video Generating AI Is Dead On Arrival

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:02):
All Zone Media. Hello and welcome to Better Offline as usual.
I'm your host ed zitron. A few months ago, open
Ai showed off Sora, a product that can generate videos

(00:24):
based on a short text prompt, kind of like chat
ebt does for text or Daali does for images. These videos,
which are usually no more than sixty seconds long, can
at times seem impressive until you notice a little detail
that breaks the entire facade, like in a video where
a cat wakes up its owner, but the owner's arm
appears to be part the cushion and the cat's poor

(00:44):
explodes out of its arm like an ameba. Reactions to
Sora's Ai generated videos, and indeed the existence of the
model itself, have ranged from kind of a breathless hype
to genuine fear that this will be used to replace
video producers, in that it can create reality adjacent videos
that for a few seconds kind of seem real, especially

(01:04):
in the case in some of open Aye's hand pick
demo videos. Yet even in these handpicked Sora outputs, you'll
find these weird little things that immediately shatter the illusion,
like one where a woman's legs awkwardly shuffle, then somehow
switch sides as she walks around, or blobs of people
merging in the background of images. These are, on some

(01:25):
level genuinely remarkable technological achievements, until you consider that what
they are and what they might do, and that there
are problems in them that run through the entire fabric
of artificial intelligence. A little over a month after SAW
was announced, open AI would debut a series of short films,

(01:46):
including one called Airhead, where filmmakers Shy Kids told the
story of a man with a balloon for a head,
and because this is AI said, balloon changes sizes twenty three,
twenty four, twenty six, twenty seven, twenty nine, thirty two,
thirty four, thirty nine, forty one, forty two, forty three,
and forty five seconds into the piece, at which point
I stopped counting because it got boring and I really
don't want to be mean to shy kids, as this

(02:08):
really isn't their fault. The very nature of filmmaking is
that you take different shots of the same thing. Something
that I anticipated SAA was incapable of doing. Is each
shot is generated fresh a saura itself. Much like all
generative AI does not actually know anything when one asks
for a man with a yellow balloon as his head.

(02:30):
SAURA must then look at the parameters spawn during its
training process and create an output guessing what a man
looks like, what a balloon looks like, what a man's
features are on his body, what color yellow is, what
the man's doing, and so on and so forth. This
becomes extremely problematic when you're working in film or television,
where viewers are far more likely to see when something

(02:52):
just doesn't look right, a problem exacerbated by moving images,
high resolution footage, and big television screens which are now ubiquitous.
Yet the press, as usual, credulously accepted Saura's quote stunning
videos that were amazing and scary, suggesting to the public
that we were on the verge of some sort of
artificial intelligence takeover of the film industry, helping boy Sam Altman,

(03:16):
their CEO, and his dumbast attempts to convince Hollywood that
SURRA won't destroy the movie business. These stories only serve
to help Sam Orman, who desperately needs you to believe
that Hollywood is scared of Surer and even more scared
of Generative AI, because the more you talk about fear
and lost jobs and the machines taking over the less.
You ask a very very simple question, does any of

(03:40):
this shit actually work? The answer, it turns out, is
not very well. In a piece for FX Guide, Mike
Seymour sat down with Shy Kids, the people behind Airhead,
and revealed how SORAW is in many ways a little
bit useless for making films. SAURA takes ten to twenty
minutes to generate a single three to twenty second shot,

(04:02):
something that isn't really a problem until you realize that
until the shot is rendered, you really have absolutely no
idea what the hell it's going to spit out. Soa
has no mechanism to connect one shot to another. Even
with hyperdescriptive prompts. It hallucinates extra features when you haven't
asked for them. And Shy Kids were shocked by how
surprised open Ay's researchers were when they requested the ability

(04:23):
to use a prompt to request a particular angle in
a shot, a feature that was initially unavailable. It took
this is what kind of drives me crazy here and
you'll hear this in the interview with him later. These
people that are open AI people, and they were making
this tool for making visual images for making moving images.

(04:44):
They didn't think that people might want different shots. I'm
so glad these are the people who were in control
of the future. Anyway, to quote the piece, it took
hundreds of generations at ten to twenty seconds a piece
to make a minute and nineteen second long film. And
what's really fun about this is that the movie's fine.

(05:05):
I it was kind of fine. I just I have
nothing really to say about it. It's a minute and
twenty seconds long, but it's it kind of works. But also,
the balloon looks different in every other shot. This isn't
shy Kids's fault. But also this isn't gonna get better.
And I will get into why as we go along.

(05:28):
These tiny little problems I've mentioned, though, they all lead
to one overwhelming issue that Sora isn't so much a
tool to make movies as it is a big, fat
slot machine that spits out footage that may or may
not be of any use at all. Almost all of
the footage in Airhead was graded, treated, stabilized, the nutscaled,
and that ten to twenty second lead time on generations

(05:50):
was for four hundred and eightp resolution footage, meaning that
even useful footage needed significant post production work to look
good enough, and just to give you an idea for
the non technical members of the audience, and this is fair.
The video you see on YouTube is usually somewhere between
seven TWENTYP, ten ADP or four K. The TV shows
you watch usually ten AP four K or upscale ten ADP.

(06:14):
These are all lots of numbers. What I'm saying is
the stuff that SAA spits out, that takes burning a
small zoo to spit out, is incredibly low resolution. On
top of not being specific, look to put it as
plainly as possible, every single time that shy kids wanted
to generate a shot, even a three second long shot,

(06:37):
they would give SA a text prompt and then they
would wait at least ten minutes to find out if
it was right, and they'd have to accept footage that
was subprime or inaccurate. And there's a really good example
of this. If you watch Airhead, a lot of the
shots are in slow motion, and you may think, no,
this is a cinematic choice, right, because you kind of

(07:00):
just admiring this man with a balloon for a head
going about his business. No, no, no, no no. They
found that this was just what Sora wanted to give
them when they asked for it. This was, in and
of itself a hallucination, in the same way that chat
GBT will authoritatively tell you that something is true that
is not sorrow will spit out a man running in

(07:22):
slow motion despite you not asking for that, And it's
so weird. They had to quote them do quite a
bit of adjusting to keep the whole thing from feeling
like a big slow mode project, and it still kind
of does. And that's rough. That's really rough. But you know,

(07:43):
I'm a curious little critter, So I decided to sit
down with Shy Kids's Walter Woodman to talk about his
experience with Sora and have him delve a little daper
into his experience with the product. And I'd say he
had a far more utopian experience and perspective on the
whole thing than I excted. Now, some of you might

(08:04):
critique Walter for being so positive about it, but I
actually caution you to just listen to what he's saying,
because Walter's perspective is interesting. He sees this as a tool,
he doesn't see it as a replacement, and I think
it's a valid perspective to come at SAA with. I
also think it's a perspective that kind of accepts a
conceit of open AI's marketing strategy, that these things will

(08:25):
get better if they do. Perhaps Walter is right, perhaps
this will be an essential tool in filmmaking, even though
he didn't say essential. Don't want to put words in
the man's mouth, but I don't think that's the case.
Let me talk to him. You decide for yourself, all right.

(08:54):
So how did the relationship between Shy Kids and open
AYE actually begin.

Speaker 2 (09:00):
The relationship between Shy Kids and Open AI began when
we made an installation for a film called dolly Land,
which was premiering at Toronto International Film Festival, and we
were the only people that our friends at Pressman Film
knew in Toronto, and so we made an installation that
looked like Salvador Dali's like studio inside of the basement

(09:26):
of the Saint Regis, which is where he lived and
made work out of, And inside of that installation we
made a like you could make your own surrealist painting,
and the way that you could make that was using
DOLLI the Open AI program, and so the open AI

(09:48):
people came to visit and check out the like what
we were working on, and making sure that it was
like something that they wanted to be a part of.

Speaker 3 (09:58):
And so.

Speaker 2 (10:01):
They met our producer Sydney, who they loved. She's easy
to love.

Speaker 3 (10:06):
And they.

Speaker 2 (10:09):
We sent them our previous work and so from there
they asked us to join this artist group. And then
when Sora came out, we saw it at the same
time as everyone else and we yeah, we got tapped
on the shoulder and said, hey, would you like to
check this out and try this out? And we said,

(10:29):
of course, that's how it came to be.

Speaker 1 (10:33):
So how did you on board? Were you just given access?
Did they give you instructions? Did they physically come to you?

Speaker 2 (10:40):
What was that like it was a top secret. They
gave us a briefcase and in a cloudy room.

Speaker 3 (10:48):
No, it was.

Speaker 2 (10:50):
Yeah, there was a very simple onboarding process where they
walked us through the technology as well as some of
its features, and yeah, it was pretty. It was pretty.
And then from there they gave us access to begin
using it and making.

Speaker 1 (11:09):
Things and you were allowed to use it without their presence.
You had direct access.

Speaker 3 (11:14):
Yep, yep.

Speaker 1 (11:16):
So okay, did you get instructions on how to write
effective prompts or did you just kind of do trial.

Speaker 3 (11:23):
And err, no, nothing like that.

Speaker 2 (11:25):
I mean in the artist group itself, there's a lot
of really amazing and thoughtful creative people who kind of
show their work and show how they got to make
the things that they did. But no, not, there was
no real engineering of our prompts. They were very much

(11:49):
just play kind of see see what comes out of you.
You're creative people that we trust, Why don't.

Speaker 3 (11:59):
You just see what works through spaghetti at the wall?

Speaker 1 (12:04):
That's cool. So during the in the piece of mathx
guide in the interview, some more from shi Kids said
the Open Eyes researchers they were surprised when they were
asked about being able to say specific shots. What happened there?
Was it just that you tried to ask Saora to
do specific shots and it didn't work, or was it

(12:25):
just not a feature?

Speaker 2 (12:27):
I think that's maybe taken a little bit out of context.

Speaker 3 (12:30):
I think.

Speaker 2 (12:32):
More so it's just people come from distant, different disciplines.

Speaker 3 (12:37):
And when.

Speaker 2 (12:40):
I say a wide shot on a one hundred and
thirty millimeter lens, people from my area of expertise know
sort of immediately what I'm talking about.

Speaker 3 (12:52):
Whereas the researchers, they are.

Speaker 2 (12:56):
More invested in sort of other other things, and so
it's it's not so much that they didn't understand or
that sort of didn't understand. It's more so just there's
all these terms in films.

Speaker 3 (13:10):
Like a zollie or like a.

Speaker 2 (13:12):
Hitchcock zoom or all of these different things that are
very understandable, but even when you go from set to set,
they mean something different. So I think it's about trying
to create a lingua franca between all of these sort
of different, very different people and very different ways of

(13:34):
using a tool. What I may call a zoom, you
may call a dolly shot, et cetera, et cetera.

Speaker 1 (13:40):
So so that feels like a training date, a challenge.

Speaker 2 (13:44):
Yeah, I think it's about trying to figure out how
and yeah, exactly what to what to train on.

Speaker 1 (13:54):
Yeah, so tell me what was the interface like? Was
it a chat box? Did you have have? Like? Just
tell me about what I actually look like.

Speaker 2 (14:03):
Sure, there's limitations of what I can say about things
like that, but I think the way that I've described
it to people without giving too much away is I
think if you're familiar with using something like the Adobe Suite.
I think that there's some commonalities whether you're using after

(14:26):
Effects or Premiere or whatever illustrator, there's like commonalities and
if you can use one, you can sort of flu's
your way around the others. I would say it's very
similar like that with open.

Speaker 3 (14:42):
Ayes tools and models that if you are.

Speaker 2 (14:47):
Used to things like chat, GPT and Dolly and those
types of models, I think you will find it find
an ease of use in using Zora.

Speaker 1 (15:01):
So within that article they mentioned that there was like
a three hundred to one shooting ratio, which correct me
if I'm wrong, means like three hundred seconds of material
each second of usable material. How does that compare to
conventional filmmaking in your experience, it.

Speaker 2 (15:18):
Would be even more seconds than that. I would say,
just three hundred shots at probably ten to twenty seconds apiece.
So whatever the math is on that, I would say
that that's pretty common with shooting. You know, when you
are shooting a fiction film or like even a documentary

(15:40):
is even crazier for that you shoot all day and
all day and from We shot a documentary recently and
I actually had to go back and watch all the dailies,
we counted about ninety hours of footage that we had,
and from that nineties hours, you're making an hour and
a half movie, So you.

Speaker 3 (15:59):
Know, you are really trimming things down.

Speaker 2 (16:02):
And I think also it's like you are getting the
five seconds that work or the you know, the section
of that shot that works. And I would say that's
pretty common to filmmaking.

Speaker 1 (16:19):
How about narrative filmmaking, because I know documentary you have
a lot of stuff, But I'm just wondering what the
burden of selection is like compared to the amount of
shots you take in just a regular movie or regular
short film.

Speaker 3 (16:31):
Even again, I would.

Speaker 2 (16:33):
Say, at least I can only speak for the way
that I shoot films. You know, if you had it's subjective.
It's subjective for sure. If you're David Fincher, you're shooting
eight hundred takes of like someone picking up a pencil,
or Stanley Kubrick, you know, is like famous for a
thousand takes. I would say that the burn rate was

(16:55):
very similar. I would say that the challenges with Sora
are like it's unbelievable at making these images that are
unbelievable and so interesting to look at, But.

Speaker 3 (17:11):
At its current state, it.

Speaker 2 (17:14):
Can sometimes be difficult to do things that in traditional
shooting would be much easier, where you say, hey, can.

Speaker 3 (17:21):
That guy go over here?

Speaker 2 (17:24):
Or can that person move from one side of the
screen to the other. Things like that are are more difficult.
But again this is baby steps. We are in like
the toddler phase, so I assume that those things will
get better.

Speaker 1 (17:39):
So you mentioned well shike, it's mentioned in the interview
the by default it tries to prevent you from creating
videos that violate copyright law existing copyrights. Did you accidentally
bump into this regularly or was this something that just
you didn't really bother you.

Speaker 2 (17:57):
No, you couldn't generate things that So when I was
mentioning like a Hitchcock zoom, you couldn't mention Hitchcock, So
you had to find a different way to describe that
as opposed to like using public figures, anything that would
have a public figure or a title you would not
be allowed to generate. From my experience, there wasn't too

(18:21):
many logos or brands or anything like that, and any
of the things that I generated, and.

Speaker 1 (18:29):
But something copyright. Did you generate anything that looked copyright?

Speaker 3 (18:33):
No? Not to my not to my eye.

Speaker 1 (18:36):
That's fine. So well, I know you don't know how
much Sorrow will cost, and we don't know that don't
even know when it will launch. Can you talk about
how much you'd be willing to pay for it? What
do you think it's worth? And I realized that this
is a vague question.

Speaker 3 (18:53):
For sure.

Speaker 2 (18:55):
I think that there is this illusion that Sora will
be this solution to all problems, and I don't think
that that is the case. I think Sora is a
tool amongst many tools, and for certain things it will

(19:15):
be very valuable.

Speaker 3 (19:17):
And so.

Speaker 2 (19:19):
In terms of value, it's like, well, how much is
a glass of water? Well, yes, if a glass of
water is just like right now in my kitchen, I.

Speaker 3 (19:27):
Wouldn't like to pay that high for it.

Speaker 2 (19:29):
If a glass of water is for a person in
the desert who desperately needs that glass of water, you
can really name your price. And I would say that
for some projects, I think that the usage of Sora
would be absolutely invaluable, and.

Speaker 3 (19:44):
I would I would.

Speaker 2 (19:47):
I don't know how much exactly that would be, would
depend on the budget, would depend on the limits and
the scales, but I would say that there's other projects
where I think it would be like totally inappropriate or
like just not worth like what, well, just when I
think of studio ghibli films that are hand drawn, and

(20:09):
I think the reason that those films work is because
of the way that they're made, or I think that
when you think of art man animation, it's like I
feel that you could feel the fingerprints in that clay,
and so I don't think maybe for those types of
films that it would be appropriate, But I think for
other types of films like Airhead or others, I think

(20:31):
it would be extremely appropriate. I think it's up to
the artists sort of discretion how much they think that
that tool is needed.

Speaker 1 (20:45):
It's doesn't the inconsistency of shots make this deeply impractical,
because that's the thing I kept coming back to.

Speaker 2 (20:53):
Yeah, I mean, depends on what project you're working on.
And again, I think that this is like early days.
I think that these are kinks and bugs that are
going to be changed, and already from day one where
we started using it to where we are today, massive
improvements have happened, and actually improvements where they've listened to

(21:16):
things that we have suggested and things that we'd like
to see and tools we'd.

Speaker 3 (21:21):
Like to see.

Speaker 2 (21:22):
So I think that, for example, for Airhead, the inconsistency
of having a protagonist, having a protagonist that stays true
through all these different shots, that's the reason why we
put a balloon in front of their head, Because while

(21:43):
different bodies can sort of be accepted, a different face
and a different head is going to be a little
bit difficult. And so we turned the limitation into our
sort of main attribute. And I would say that again,
that works for that story. But I don't think that
all stories are going to find this valuable. And I

(22:06):
also don't think every single shot needs to come from Sora.

Speaker 3 (22:11):
I think that there's a world where it can be.

Speaker 2 (22:14):
An addition, or it can be the start of a
story where instead of just brainstorming and just having a script,
you make a sort of moving mood board or a
trailer or so. I think that there's like tons of
stages along the pipeline that it would be extremely valuable

(22:36):
and help elucidate concepts and bring them to life.

Speaker 1 (22:41):
So thematic question, so you avoided filming locations and all
of this, but you spend a lot of time writing
prompts and you're waiting for Sora to generate clips, then
up skating and all that. Do you think you could
make airhead assuming you could get around the balloon head thing?
Do you think you could make it quicker in real life?

(23:02):
Them was soa kind of essential to get it done
in the timeline you did, because it's like a week
and a half two weeks, I.

Speaker 2 (23:07):
Think, Yeah, I don't know, that's an interesting question. I mean,
we definitely wouldn't be able to fly around the world
and yes, get the shots at the car race and
all of those things, so.

Speaker 3 (23:23):
I think it would probably be shorter.

Speaker 2 (23:26):
But I think in general, the conversations about like time
and money are like super reductive in a way in
that I think that without Sora, this wouldn't exist, And
I think that that is the more interesting conversation. As
a director, most directors I know have a folder of

(23:50):
unrealized ideas, and I think that my hope is that
Sora will allow us to dust off those folders and
breathe new life life into concepts, and when people see
what those concepts could be, my hope is that it
gives a lot more people opportunities to have their ideas illuminated.

(24:13):
And whether that means to go and shoot it now
traditionally or some hybrid. I think that that, to me
is what's most exciting.

Speaker 1 (24:23):
So where do you see SORA going. I know you're
considering looking at it as kind of a complementary tool,
but do you think that that's its use case or
do you think it'll ever do end to end filmmaking.

Speaker 2 (24:35):
I think I think let a thousand flowers bloom, you know.
I think that there is people who are going to
just use it for small complementary things to maybe help
with in the same way we use stock footage.

Speaker 3 (24:50):
Now.

Speaker 2 (24:51):
I think some people are going to use it as
a way, say you are from a commune unity that
has maybe a little bit of a less established film community,
and it's a way to have you compete with the
big boys in terms of special effects and usage. And again,

(25:13):
I don't just think it's as easy as bleep blue
block type in the prompt here comes the thing, but
rather it allows you to just have a really powerful
collaborator that you can help make maybe larger concepts and
bigger ideas. And then yeah, I think that there's some
people end to end who are going to make things

(25:33):
that are completely generated or most of the shots in
it are generated or things like that. In general, the
thing that feels interesting to me is like helping to
deepen humanity, Whereas the more you sort of simplify the process,

(25:58):
I think that that is like, I don't know, it's
never a simple process. So anytime you hear about something
that is going to make it all easy and make
all your troubles go away, I'd be very wary of that.

Speaker 3 (26:11):
I think film is.

Speaker 2 (26:12):
Going to always be difficult and a challenge, and I
think the benefit of SORA will be to help lead
us into new pasts and lead us into new directions.
If I were to tell you, hey, we made this
film called Lord of the Rings and it uses CGI

(26:33):
orcs and it makes massive orc fights. You know, if
I told you that in the nineteen thirties, you'd probably gasp.
Or if I told you that CGI is going to
be a predominant way in which we make films in
twenty twenty four, I think you would go, ah, that's
not real filmmaking.

Speaker 1 (26:50):
And I don't think I think you kind of saw
that in the nineties.

Speaker 2 (26:54):
Really yeah, I don't think history is too kind to
those people that go, this is not gonna work This
is not art. This technology is not the way I
just think it's it depends on the artist, and it
depends what they want to bring to it. I think
that's the key X factor here.

Speaker 1 (27:15):
One final question, with that all in mind, do you
think that SRA is going to hurt filmmakers? Do you
think it's going to replace people?

Speaker 2 (27:23):
I mean, I hope not. I mean that's my job,
so I would very hope not.

Speaker 1 (27:31):
No. I very much.

Speaker 2 (27:33):
Understand people's fears, and I think that you know, I'm
a student of history, so when I look back in
history and the camera obscura comes out, painters are talking
about how we aren't going to need painters anymore, because

(27:55):
now we can capture reality, why do you need a
painter to go and paint it? And it's a very
valid point, But painters didn't go away. And then there
was this whole new industry called photography, and then after photography,
there was this whole new industry called film. And then
after film, there was this whole new industry called home video.

(28:17):
And then after home video, there was this whole new
industry called cell phone video. And then there was this
whole new industry called tiktoks and vines, and I just
think that when people don't come in contacts with things
they're immediate. As humans, our immediate reaction is fear, and

(28:39):
we're worried about things that are new because we do
not yet understand them. And I think that for us,
we like to face those things face on. And I
think that the other side of that coin is that
there's some kid right now in rural Bangladesh who has

(29:00):
this amazing, big idea and maybe doesn't have all the
resources that everyone else has, and with these types of technologies,
it may level the playing field for kids like that
to compete with the avatars of the world, compete with
the Marvels of the world, And then I think we're
going to all be on this level playing field, and
what's going to matter is not just who has the

(29:23):
highest budgets and who has the most resources, but who
has the best stories. And for me, that's the exciting part.
We work with groups of collaborators that we love and respect,
and our hope is never let's work with them less.
Our hope is always let's enrich those relationships and hopefully

(29:47):
grow them and hopefully bring more people into our collective
and more people into our process. So that's our hope.
Maybe I'm utopic, maybe I'm wrong, but that's the that's
the choice, that that's the way we're choosing to look
at this.

Speaker 1 (30:17):
In Woodman's mind, Surra is a tool, an extension of
creatives methods rather than a replacement of filmographers or actors,
what have you. And that very much lines up with
sam Ortman an open AI's sales pitch for Sura, his
utopian perspective, his words, not mine. It's predicated on both
film studios acting with integrity, something they've proven to never do,

(30:40):
an open Ai being able to make Sura a significantly
better tool, something that's going to require masses more training,
data and compute that I think is actually in existence.
Paul Trillo, an LA based artist and filmmaker, speaking to
Business Insider in April, described Saura as a research project

(31:00):
in Alpha, mentioning that it was a little confusing who
the market was for the service, and I think that
jails with another problem that Woodman raised, that what might
be a zoom out shot for you would be a
completely different term for someone else, which in turn would
require open ai to have both the right training data

(31:22):
of a zoom shot and many, many, many of them,
to be clear, But they need to know the multitudes
of different terminologies that go into filmmaking. Now, if they
don't give a shit, maybe that's a completely different story.
In short, SAUA faces both the intractable problems of AI
that I've mentioned in the previous episode, PKI go and

(31:44):
listen to it, but also a few of its own,
namely that generating moving images isn't just about ingesting a
bunch of footage, but it's about understanding said footage well
enough to generate something else based on a multitude of
different perspectives, descriptions, and cultural contexts. I'm not sure that

(32:06):
open AI really Most people realize how complex even the
simplest movie is, how much work goes into making a film,
and I think that that's actually what excites people about this,
because making films can be inefficient, it can be extremely taxing,
it can be extremely expensive. But the problem here, I'll

(32:27):
get into the other ones as well, is that SAURA
is being sold to film studios. That is who Sam
Mortman is going to, and thus it's going to be
built for people who don't make movies. I'm actually really
happy to hear that shy kids and other artists are involved,
so it'll actually be tuned to be somewhat useful. But
I don't think people realize how gigantine the task is

(32:50):
that SRA is going after, and how I think it's
impossible it can go any further. But I digress. I
just don't believe that SORA actually works if you're making
a movie. While pixel movies may take years to render,
they've got supercomputers and specialized hardware, and more importantly, the

(33:11):
ability to actually design and move characters in the three
D space. If you are putting something in Saura, what
are you designing? If you put a character in this
in again, you cannot have consistency between these things. That
is a problem across all generative AI. You can not
do that unless, of course, using copyrighted footage, mister Oltman.

(33:35):
But seriously, though, with no consistency cross shots, what the
hell are you doing? While there are unexpected things that
might happen in a three D animated movie or a
CGI situation, you still have complete control over the thing
you are putting on there, the thing you are animated.
You can make subtle tweaks to him that doesn't seem

(33:56):
to be the case with Sora. You can adjust what
on the screen. But even though this is AI generated,
it doesn't have the benefits of regular generative stuff like CGI,
which stands of course for a computer generated image. I believe,
and if I'm wrong, you're gonna yell at me in
the emails. But seriously, though the practical use cases for SURA,

(34:18):
they're just kind of not there. Sora's attempts to replace filmmakers,
if that is open ayes goal, and I really believe
it is, they're dead on arrival because it's an impractical
and ineffective solution and the problems it's solving are really
only ones created by Hollywood executives. The AI hype bubble,

(34:39):
as I have noted repeatedly, is one entirely reliant on
us accepting the idea of what these companies will do,
rather than interrogating their ability to actually do it. Sourra,
much like all generative AI, suffers from an imprecision and
an unreliability caused by hallucinations, an unavoidable result of your

(35:00):
using mathematics to generate things, and the massive power and
compute requirements are just prohibitively expensive. If this is going
to end up as a VFX tool, or a productivity tool,
or as a fill in tool. It's going to need
to be a lot cheaper than it is to run.
Generative AI is already unprofitable to make, soa any kind

(35:24):
of useful open ay will have to find a way
to dramatically increase the precision of the prompts, reduce hallucinations
to pretty much nothing, and vastly increase processing power across
the board. Sora hasn't even been launched save for, of course,
these handpicked companies that got to test it, meaning that
this ten to twenty minute weight between generations of moving

(35:45):
images that's likely to increase once people use the product.
And that's before you consider how expensive it's going to
be to run the bloody thing. This is a significantly
more complex model than chat GPT, which is already unprofitable.
Sam Moltman can make money, but can he make profit?
I severely bloody doubt it. He hasn't before, and I

(36:07):
don't think he's going to in the future. He's still
begging Daddy Satchia over at Microsoft to give him a
supercomputer so his things can fart out things more profitably.
It's just drives me a little insane. And these things
I've talked about their intractable problems that open aiy has
failed to solve. They've failed to make a more efficient

(36:29):
model for Microsoft last year in twenty twenty three, their
Arakis model Jesus Christ. And while GPT five is meant
to be materially better, to quote mister Altman, it isn't
obvious what better means when GPT four performs worse at
some tasks than its predecessor. I do believe Sam Mortman
is telling the truth when he says that the future
of AI requires an energy breakthrough. But the thing I

(36:51):
think he's leaving out is that it may take an
energy breakthrough and indeed more chips for generative AI to
approach any level of ness. And he's hoping that people
will buy the hype without asking too many annoying questions
like what does this stuff actually do? Or is this useful?
Or does this actually help me? Or will this be

(37:12):
around in ten years? To be clear, Sam Altman is
the single most well connected and well funded man in AI,
with a direct connection to Microsoft, a multi trillion dollar
tech company, and a rollodexter includes effectively every major founder
of the last decade, and he still can't get past
any of these problems, partly because he is not technical

(37:34):
and thus can't really solve the problems himself, and partly
because the problems he's facing are burdened by the laws
of maths and physics. Generative AI hallucinates because it doesn't
have a consciousness or any ability to learn or know anything.
It's extremely expensive because even the simplest prompts require GPT
four to run highly complex mathematical equations on graphics processing

(37:59):
units that cost upwards of ten thousand dollars apiece. Even
if generative AI were cheaper or more efficient or required
less power, it would still be a process that generates
answers based on the extremely complex process of ingesting an
increasingly dwindling amount of training data. These problems are significantly
compounded when you consider the complexity, size, and massive legal

(38:22):
ramifications of training a model on videos. A problem that
nobody has seem fit to push Altmnormorti or anyone else
at Open AI about what's a pisstake really seems like
an obvious one, like, hey man, you need a bunch
of training data to train chat GPT, which does words
how are you getting all these videos again? Big credit

(38:43):
to Joanna Stern who asked mirror Murati, CTO of open Ai,
whether Sawer was trained on YouTube videos, and then Mirrormorati
of course made that incredible face. Go look up that video.
I'll link it in the notes. That's how moately the
problem with the current bubble. So much of its success
requires us to tolerate and applaud these half fast, half

(39:05):
finished tools that only sort of kind of do the
things they're meant to do, and we're meant to nod
and smile and clap and say great job, Sammy, like
we're talking to a bloody child rather than a startup
with thirteen billion dollars in funding with a CEO that
has the backing of goddamn Microsoft and soa is the
ugliest messiest problem of them all. It's videos, while superficially impressive,

(39:29):
are still deeply, deeply flawed. They take way too long
to generate a problem that's only going to get worse,
and they're just far too inconsistent, which is a problem
created by the nature of how generative AI works and
its approach to generating things using mathematics, and if it's
planning to be a VFX tool, if it's planning to

(39:49):
be a sidearm for filmographers, it's going to have to
be a lot cheaper than it's really practical to make it. Again,
nothing open Ai makes is profitable. They may make over
a billion dollars of revenue, but everything is burning money.
It's just very frustrating. It's all very frustrating. Sora seems

(40:14):
kind of cool, but when you take away the cool
side and you just look at it for what it is,
it's just another con from Sam Altman. It's just another
unfinished product that is not able to fit the task.
It's just another thing that you look at and you say, oh,
if that was just a bit better, it'd be really good.
Except in this case it would be a lot better. Yeah,

(40:36):
all the press writes about it's incredible, it's amazing, and
you can separate the technological achievement of using maths to
generate a visual moving image that's genuinely cool. But you
gotta stop for a second and say, as cool as
this is, the people in the back of their shot,
they're molding into each other. It's like the thing, it's disgusting. Hey,

(41:00):
that monkey's got like five arms. That's weird. I don't know.
I just feel like normal people don't get this much leniency.
You and I don't get people saying great job when
we do kind of a shitty job. And if we
brought something to someone that was insanely expensive only really
did ten percent of the job, you needed it too,

(41:22):
And also the things that created took forever and looked horrifying.
I don't think we'd get told great job. I think
we'd be told we'd wasted a lot of money and
that someone was quite mad at us. I'm tired of this.
I'm tired of these companies announcing these half completed products
and having the media dance around and act like they've

(41:43):
delivered something truly incredible. I'm tired of the public being
expected to do the mental and emotional labor for Sam
Moultman and other AI companies, saying it's remarkable that they're
even able to do this, and assume and give them
credit for some inevitable future where all of thesebms are gone,
despite little proof that such a thing is possible and

(42:03):
plenty of proof that it isn't. And as I've suggested,
I really don't think it is. I think Sora is
dead on arrival. I think it's too expensive, too imprecise,
and there is no fixing those problems. You can iterate
on them, you can improve them, but without some kind
of energy or chips breakthrough, they're not even going to

(42:24):
have the compute or really the money to build this
thing into anything even half functional. And I'm calling on
the press to push back on these companies. I'm calling
on them to refuse to declare this quasi functional software
as complete. I'm tired of seeing the media back these

(42:46):
companies and do marketing work for them when they're not done.
They don't deserve the credit, and I'm demanding that people
like Sam Altman actually change the world before anyone says
that they're doing.

Speaker 3 (43:00):
So.

Speaker 1 (43:08):
Thank you for listening to Better Offline. The editor and
composer of the Better Offline theme song is Matasowski. You
can check out more of his music and audio projects
at Mattasowski dot com M A T T O. S
O W s KI dot com. You can email me
at easy at Better Offline dot com, or visit Better
Offline dot com to find more podcast links and of course,

(43:29):
my newsletter. I also really recommend you go to chat
dot Where's youreed dot at to visit the discord, and
go to our slash Better Offline to check out our reddit.
Thank you so much for listening. Better Offline is a
production of cool Zone Media. For more from cool Zone Media,
visit our website cool Zonemedia dot com, or check us
out on the iHeartRadio app, Apple Podcasts, or wherever you

(43:51):
get your podcasts.

All Episodes

Episode Transcript

Host

Ed Zitron

Popular Podcasts

On Purpose with Jay Shetty

Stuff You Should Know

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}OpenAI's Video Generating AI Is Dead On Arrival