Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:02):
Bloomberg Audio Studios, Podcasts, radio News. Hello and welcome to
another episode of the Odd Blots podcast. I'm Tracy Alloway.
Speaker 2 (00:23):
And I'm Joe Wisenthal.
Speaker 1 (00:24):
Joe, what's been your favorite chat GPT or claude prompt
so far?
Speaker 3 (00:31):
You know, it's funny because I have a lot of
fun with them, and also I use them for serious things.
So I'll like upload conference call transcripts and say, tell
me what this company said about labor market indicators or
something like that, and that'll be extremely useful for that.
Speaker 1 (00:47):
Wait, do you actually find that more efficient than just
doing a word search for like labor or working? I don't.
I hate uploading stuff because you can only do it
in like fragments.
Speaker 3 (00:56):
No, what, Tracy, Oh, let me, I'll show you how prompt? Okay, No,
I get a lot of professional use out of the
various AI tools, but I also, you know, have a
lot of fun with them. And there's even a song
and I'm not going to say which one that I wrote.
I didn't use the lyrics. No, I did not like
(01:16):
because it's very good. Wait what did you use?
Speaker 1 (01:18):
Did he give you an actual melody? What happened?
Speaker 4 (01:21):
No?
Speaker 3 (01:21):
So there was a song that I liked, okay, and
the song title sort of rested upon a pun okay,
and so I asked chat GPT to come up with
another song that sort of like had a similar twist
based on the headline of that song. I needed basically
a song prompt idea.
Speaker 1 (01:43):
This opens up a whole can of worms. No, this
is actually the perfect segue into what we're going to
talk about today, because for you and I, using something
like a chat GPT, we don't really have the same
concerns that a proper company or large which corporation would have, Like,
it doesn't really matter to us if the answer is wrong.
(02:04):
I mean, ideally you would like it to be correct,
but if I'm just asking some silly question, it doesn't
really matter what chat gpt spits out at me. And
also copyright kind of doesn't matter, so we don't care
what it spits out in terms of who owns it,
and also we don't care what we're putting in in
terms of who owns that. That's right, But if you
are a company you are thinking about generative AI very differently.
Speaker 2 (02:28):
I just want to say one thing, which is.
Speaker 1 (02:29):
That your defense Okay, defend yourself.
Speaker 3 (02:32):
No, No, I'm not even trying to defend myself. If I upload, say,
you know, the McDonald's earning transcript, and I say, what
does McDonald say about the labor market, then there's some quote.
I always go back and check that that quote is
actually in there. So I do very good, you know,
I'm not just blindly relying on it. I do also
do my own work and everything. But yeah, it's very true.
Like so I can say I get a tremendous amount
(02:54):
of use from chat, GPT or Claude or whatever, and
it is very useful to me. But it makes mistakes sometimes,
and if you think about deploying AI in the sort
of enterprise world, then maybe like a one percent mistake
raid or a one percent hallucination or you ever want
to call them, is just completely unacceptable and a level
(03:15):
of risk that makes it almost unusual for professional purposes.
Speaker 1 (03:19):
Absolutely. And of course the other thing with AI is
there is still this ongoing, very heated debate about how
transformational it's actually going to be. So you and I
are using it as you know, a productivity hack in
some cases, or maybe to generate song lyrics or even
songs in some cases, but what is the true use
(03:41):
case for this particular technology. There's still a lot of
debate about that, and so I'm very pleased to say
we do, in fact have the perfect guest. We're going
to be speaking to someone who is implementing AI at
a very, very large financial institution. We're going to be
speaking with Marco Urgenti, the chief information officer at Goldman Sachs. Marco,
(04:02):
thank you so much for coming on of thoughts.
Speaker 4 (04:04):
Thank you for having me.
Speaker 1 (04:05):
Marco tell us what a chief information officer does at
Goldman Sachs. Whenever I see CIO, I always think chief
investment officer, as it's very confusing. Yeah, so what does
the other CIO do?
Speaker 4 (04:17):
So last week I was in Italy visiting my mother.
She's eighty three, and she obviously doesn't know much about
technology or banking, and so she said, what do you
do with Coleman? And I said, you know, I just
tried to simplify. I say, make sure that the printers
don't run out of And interestingly, the CIO job has
been traditionally associated with the word it.
Speaker 2 (04:41):
Okay and it.
Speaker 4 (04:42):
I tell you, talk to any technologist, they don't want
to be classified as IT.
Speaker 3 (04:47):
Right, because those are you associated with those are the
people who like, see if the ethernet cable with.
Speaker 4 (04:51):
Those are the ones who tell you that those that
you know, I mean, I have a lot of respect
for it, but generally you go to the IT department
when something doesn't work, okay, And so it's very back
office and something that attracted me to this job. I've
been here for five years and this is the first
time that I do like a CIO job. Before I
was doing more like, you know, creating technology, et cetera,
(05:12):
and service. I can talk about that, but is the
fact that the role of a CEO has actually changed
quite a bit, and now it's about really asking the question,
you know, how do we implement technology in order to
achieve our strategic objectives and actually to be differentiated, And
it's really sitting at the strategic table of the firm.
Speaker 2 (05:33):
Okay.
Speaker 4 (05:34):
So today we live in a world where obviously a
lot of the things that we want to do, or
every company wants to do, are really kind of determined
by how good you are at technology. And so I
think the role of the CIO has changed quite a bit.
And now, you know, I would define it as in general,
defining the technology strategy of a firm and also making
sure that you have the right culture in the engineering
(05:56):
team in order to execute on that.
Speaker 3 (05:58):
What's the day to day look like? Like, what's the
typical day you get into the office and then what.
Speaker 2 (06:03):
Do you do?
Speaker 4 (06:04):
Well? I mean I get into the office, and I generally,
like everybody else, you know, I talk to people every
day all day, and so I talk to people. You know,
we have a bunch of meetings one after the other. End.
I have teams coming to me with either regularly scheduled
meetings or meetings that have been requested to discuss a
certain topic. And you know, we just go through is
there a whiteboard? Well right now in the age of Zoom,
(06:27):
I guess still. You know, we have a globally distributed
team and so a lot of our people are not
in the same office, and so we use virtual whiteboards
like everybody else. But I would say, you know, one
of the things that I tried to do while joining Golma,
which was part of sort of the cultural agen that
was emphasizing the importance of narratives and written world versus
(06:48):
you know, PowerPoint and talking. Okay, so, which is kind
of what I learned that Amazon over the years. Okay,
all right, w I was a REDWS and one of
the things you learned there as soon as you join Amazon,
in any part of Amazon, like the first few meetings
are kind of shocking because nobody talks. Everybody starts reading.
You start reading for like sometimes thirty minutes or forty
(07:10):
five minutes, and if you're the author of the document,
you're just sitting there basically, and you just try to
look at people's faces and understand what they think about
your document. And sometimes, you know, if you're with Jeff
Bezos or others, you know, at that time it can
be pretty pretty terrifying. And so this kind of shift
from a culture of people talk, people comment on a PowerPoint,
(07:34):
and the discussion sometimes get you know, driven by who
has the stronger personality versus, you know, who has the
greatest ideas. One of the things that I try to
change is that a lot of the meetings that we
do today actually start the same way by reading a document.
So I now read a lot of documents like I
used to in Amazon. You know, I would say maybe
(07:54):
thirty forty percent of the meeting are starting that way,
and I think people love it because it breaks the
barrier of language for someone like me, that English is
obviously not my first language, breaks the Sometimes some of
the people are more shy than others, et cetera. So
people see that as a mechanism for inclusion. So back
to your question, let's say thirty forty percent of my
(08:14):
meetings actually now start by us reading a document together
and then commenting on that and making decisions.
Speaker 3 (08:20):
Can I just say, Tracy, I've always thought more meetings
you should start with just reading. Because you go to
you hear like a quarterly call or a FED event,
and someone just reads out of prepared text. It's like,
just let everyone read it and just jump straight into like,
let everyone do the reading first.
Speaker 2 (08:34):
You don't need someone.
Speaker 3 (08:35):
Standing up there talking about what's on a written piece
of paper somewhere.
Speaker 1 (08:39):
Anyway, I agree that we could reduce the time of meetings. Yes, okay,
So speaking of meetings and the decision making process, then
talk to us about how Goldman Sachs decided to approach
generative AI. What was the decision making process? Like there
the development process, and you know, we'll get to what
(08:59):
you're developing, but like, how did you initially approach it?
Speaker 4 (09:03):
So I think our initial approach was really to realize
that there were so many more things that we didn't
know compared to the things that we knew, because it's
a really new thing, and even for companies like us
that have been working on machine learning and traditionally I
for literally decades, this felt like a very different thing.
Speaker 1 (09:24):
What sort of timeframe are we talking about? Like, was
there a sort of like big realization that this is
something that we need to focus on.
Speaker 4 (09:31):
Yes, because I was lucky enough that I got into
the very very early version of GPT, even before it
was called chat GIBT. So the very first version was
essentially completing a sentence. It wasn't even allowing you to
do interactive chat. You would just paste a text and
(09:52):
that will just complete that text. And so I started
to do that with a bunch of stuff, and then
I was saying that the quality which this will continue
was pretty much indistinguishable with the part that you actually
put in that. And so we started to obviously talk
between ourselves but also among other people in the industry,
and we all realized very soon that this would be
(10:12):
something very different, but be also something that could have
a pretty profound impact in what we do. Because at
the end of the day, we are a purely digital business.
We don't bend metal, we don't you know, like use
high temperatures. We don't really have physics. So it's all
about how we service our clients. It's all about how
smart we are. It's all about how we can process
(10:33):
incredible amount of information. It's all about, you know, how
we analyze data in a very sometimes opinionated way. We
form our own views on the market, we form our
views of investments, et cetera. And so given that this
AI showed very early sign of being able to synthesize
and summarize very complex set of information but also identify patterns,
(10:58):
we thought that could be something that we definitely need
to pay attention to. So given that, one of the
things that we decided to do very early on was
to put a structure and I can say that more
about that, put a structure around this so that we
could experiment but in a sort of safe and controlled way.
Speaker 1 (11:18):
Right, So you decided to develop your own Goldman Sachs
AI model versus you know, use a chat, GPT or
clod or getting something off the show.
Speaker 4 (11:27):
Actually, initially we kind of thought about that, but then
very quickly. We decided that our time was spent much
better with using existing models, which by the way, we're
iterating really really quickly, but then put them in a
condition so that they would be safe to use and
also they would actually give us the most reliable information,
(11:48):
because taken as they are, you can't just drop a
model in an environment like Goldman and then, like you know,
to your earlier point of a one percent in accuracy,
zero point one percent in accuracy completely an acceptable class.
There are a lot of potential issues related to you know,
what data has it been used to train? And you know,
(12:09):
there is a lot of uncertainty with regards to you know,
like what are the boundaries between what you can safely
use and what you can And so what we decided
to do was instead to build a platform around the model.
So think of that almost as if you had a
nuclear reactor. You know that now you have invented fission
or fusion, and there is a lot of power that
(12:30):
can be generated from that, but then you need to
contain it and direct it in a certain way. And
so we build this GSAI platform, which essentially takes a
variety of models that we select, puts them in the
condition of being completely segregated and completely secluded and completely
safe from an information a security standpoint. Abstract some of
(12:51):
the ways to use the model, so that our developers
can use the models interchangeably, and then creates a set
of standardized way, for example, improve the accuracy using retrieval,
a granted generation, access external or internal data sources, applying
entitlement so that someone is on the private side, you know,
(13:13):
I've got to see different information that someone is on
the public side. And then on top of that, build
a developer environment so that people will very easily be
able to embed that AI in their own applications. So
imagine this, we got a great engine and we decided
to build a great car around that.
Speaker 2 (13:45):
What are you putting in the model?
Speaker 3 (13:47):
Because I have to imagine at a bank like Goldman,
you know, you have a lot of data, but you
must have just an extraordinary amount of unstructured data. There's
conversations that bankers have with clients. There's other sort of meeting,
the meetings you have, and there's words that are said
during that meeting that could be synthesized in some way.
In these early iterations, you know, I upload a conference
(14:11):
called transcript and I ask a question, what do you
upload it? What is the unstructured data that you have
or the questions or these yeah, what are you what
are you putting into it from your reams of knowledge
that you must have internally.
Speaker 4 (14:23):
So one of the first things that we did was
use the platform and the models to extract information from
publicly available documents. That's kind of the safest way public
filing all the case or the queues and you know,
and obviously earnings, and put our bankers in a condition
to be able to ask very very sophisticated multi dimensional
(14:45):
questions around what was reported, cross refit with previous reports,
cross refit with any announcement, any earnings, called transcripts, all
things that are out there but just are difficult to
bring together. And so that as a involved into a
tool that physically we use and we're rolling it out
right now as an assistant to our bankers so that
(15:08):
they can you know, service their client or answer client
questions or even their wrong questions. In a time there
is a fraction of what you used to take even
generate documents that then can be you know, shared the
clients and so on and so forth. And obviously we
always have as a rule, like when you drive a
car that has some autonomous capability, that you always keep
(15:30):
the hands on the wheel. Our rule is that there
always needs to be a human in the loop. Okay,
And so the way that works is actually interesting because
we found out that you can't just shove something into
a model and then pretend that the model is going
to give you the answer right away. Why well, because models,
by themselves, you know, they essentially apply a stochastic or
(15:51):
a statistical way to understand what is the next world
that they need to say. So, no matter how good
is the material that you put in, there's always going
to be some level of variability. There is almost like
the intersection between the documents that you insert and what
is I call it like the shadow of all the
knowledge of all the things that the model has seen before.
(16:12):
And so we really perfected this. You know, there are
two techniques that are widely used to improve the accuracy
of the answers. One is working on the way those
models represent knowledge, which is called embeddings technically, and the
concept of embeddings by the way, everybody talks about embeddings,
but then for very few people actually it took me
(16:34):
a while to understand that well. And embedding is simply
a way for the model to parameterize and create a
description of what they're seeing. So if I see a phone,
for example, in front of me, the embeddings of a
phone could be it's a piece of electronic Yes, one,
it's definitely a piece of electronics. It's edible. Zero. You
(16:54):
can't really eat it, you know, And then you have
all these parameters. Is almost like twenty questions. I give
you all these questions and then you finally understand that
it's a phone, and that's what the embeddings is almost
like the twenty questions of the reality instead of twenty
is like twenty twenty thousands. And then you have DRAG,
which is the retrieval augmented generation, which is actually interesting
because you tell the model that instead of using its
(17:18):
on internal knowledge in order to give you an answer,
which sometimes, as I said, is like a representation of reality,
but it's often not accurate, you point them to the
right sections of the document that actually is more likely
to answer your question. Okay, and that's the key. It
needs to point to the right sections and then you
get the citations back. So that took a lot of effort.
(17:39):
But we're using that in many many cases because then
we expanded the use case from purely like banker assistant
in a way to more like okay, document management. You know,
we process millions of documents. Think of that credit confirmation
implements confirmation. Every document has a task called entity strauction.
(17:59):
So you need to extract stuff from the document and
then digitize it and then model it in a certain way.
And so the use of general TVii there does a
great job at extracting information. And this is an interesting
concept because you don't have to actually tell a fixed pattern.
(18:20):
You can just say, give a lot of examples, and
then the AI will figure out from that pattern. One
of my favorite example is the following. Let's say that
my phone number is five five three two one three
h five oh, and someone writes in the document instead
of with zero rights an oh. Okay. You can test
yourself even with GPT, if you give a number with
(18:42):
an O instead of zero, and you ask GPT, what's
likely wrong with this entity? GPT is gonna tell you, well,
it looks like a phone number that is an all,
which general is not in phone numbers. Most likely this
is the correct phone number. Now, nobody has written software
to do a pattern match in there. And imagine if
(19:04):
in the tradition, in traditional way of doing antity instruction,
there were developers that were writing rules. They were saying, okay, numbers,
it needs to be ten digits and blah blah blah.
The AI figures.
Speaker 2 (19:15):
Out their own rules.
Speaker 4 (19:17):
That are the most likely. So this is the key thing.
It has common sense. And that common sense when you're
dealing with millions of documents that contain all bunch of
ways that you must might have written those things, and
imagine the complexity of all the rules that you need
to write. And every bank has the same problem. This
simplifies things tremendously because it's able to figure out what's
(19:42):
most likely by itself. And so that thing evolved into
a tremendous time saving for everybody in the bank that
has to do with the workflow documents. And so that
was a very interesting finding that we did early on.
And so again to summarize more, those are raw material
of intelligence. You know you need to somehow direct them,
(20:05):
you need to guide them, you need to instruct them,
you need to put them in an environment that actually
gets the most out of that, and that's what we've
been focusing on.
Speaker 1 (20:13):
So going back to the analogy that you used previously,
this idea of a nuclear reactor and sort of building
the containment casing or the protective casing around it. I
imagine one of the complications of being Goldman Sachs and
working with AI is that you're a regulated financial entity.
How does that added complexity affect your use of AI.
(20:35):
Are there additional data considerations or additional infosec considerations.
Speaker 4 (20:40):
I think that's a great question, because obviously we live
in a regulated world, and in fact, I have to
tell you that in this case, regulation actually helps us
think through all the possible unknown now, something that, as
I said, is something that is still largely something that
nobody really completely understands. And so what we did was
to put but governance around the usage of the models
(21:03):
and also governance with regards to the use cases that
we can implement on the models. Every bank has a
function called model risk, which, in the traditional sense, a
model is any decision or any algorithm that is running
automatically to do for example, pricing or you know, there
is a lot of that tradition in every bank risk calculation, etc.
(21:24):
So that's the traditional model risk. We use that very
well established pattern. That is also you know, that has
its own second and third line like controls and supervision
also to validate what we do on the AI side.
So there is a governance part which we really set
up very early on. We have an AI committee that
looks at the business case should we do this? And
(21:47):
then we have an AI control and risk committee that
looks at, okay, how are we going to do that?
And then the two of them need to actually come
together before we can release a use case. And then
of course we did a lot of work with regards
to the let's say accuracy lineage and in a way,
the way you connect the output to where does the
(22:08):
data come from and who can actually see that what
we call entitlements, and we did that in lockstep with
the regulators, so that I think, you know, you know,
in a world, I think we put a sort of
what we like to call responsible AI first since the
very beginning, and it really helped us. The fact that
you know, we embedded all those controls into a single platform.
(22:28):
This is how our people use AI inside the inside goal.
Speaker 1 (22:31):
This is something I'm really interested in just from a
technical perspective, But can you talk a little bit more
about that interoperability aspect. So you have a pool of
data that is gold pins that you presumably don't really
want to share with outside entities, So how do you
plug that into an AI model if you're working with
you know, Chat, GPT or clod or something like that.
Speaker 4 (22:53):
So there are two ways that we do that. We
use the sort of a large proprietary models in a
way that we worked with Microsoft, we work on Google.
We have very strong partnerships, so that essentially there are
controls that guarantee that nobody has access to the data
that we put into the model, that the data leaves
(23:13):
no side effects, so it's not saved anywhere, it's the
only stays in memory. The model is completely stateless, meaning
that the state of the model doesn't change after the
data comes through, so there is no training, there is
nothing down on that data. And also that operator access
meaning who can actually access the memory or those machines
is restricted and controlled and needs to be agreed with us.
(23:36):
So imagine secure in putting a vault around those models.
But even then, what's really really sort of secret, source, proprietory, etc.
We like to use also different approach to use open
source models that we can run on our own environment. Okay,
and we like a lot of open source models. I
have to say that. One we particularly like Islama and
(23:58):
actually Lama tree and Lama tree point one especially as.
Speaker 2 (24:01):
No one developed by Facebook.
Speaker 4 (24:03):
Oh yeah, so they recently announced Lama three point one,
which has a version that is four hundred and five
million billion parameters. So it's pretty large and it seems
to be performing. You know, the gap with those big
fundational models is now very very narrow. So for that,
we run it in our own sort of a private cloud,
(24:23):
call it that way, with GPUs that we own, and
that we train it with data that stays in that environment.
So imagine that. You know, our approach is okay, there
is a sort of arrating of sensitivity of this data.
Every data needs to be protected. Therefore we use those
safeties all throughout regardless. But then for the super super
super secret stuff, you know, we like to do it
(24:45):
in our own embod.
Speaker 3 (24:46):
Since you're talking about building your own environment, and this
is something we've talked a lot about on the podcast.
Hardware constraints, energy constraints, things like that, how does that
manifest in your world some of these physical, real world
constraints to building out the compute platform at Goldman sax Well.
Speaker 4 (25:06):
Initially we thought maybe we can host those GPUs in
our own data centers, and then immediately you run into
considerations such as a first of all, they develop a
lot of heat. Secondly, they consume a lot of power.
Tree there is a decent chance that they might fail
because you know, of all those considerations if you're not
properly addressed. And then d they need very special for example,
(25:29):
interconnect and high speed bandwidth between them. And so the
decision what we ended up doing is actually to have
them hosted into some of the hyperscalers that we use,
but use them in their own virtual private clouds. So
those racks are basically only ours. And if you're asking
me the more general question, which is, hey, where is
the world going with regards of that? Okay, so right
(25:52):
now there are two really rapidly competing forces. One is
pushing towards more and more consumption and one is pushing
for more and more optimization. Okay, and I can talk
about that for a couple of minutes. For the more consumption,
I mean, really the two dimensions for scaling a model
is one of the most important. One is obviously the
(26:13):
size of the prompt or the context. Okay, and there
is pretty good evidence that the larger the context, which
is really like the memory of those models, and the
more you can get out in terms of the ability
to reason on your data. That has already gone up
from thousands to tens of thousands to now millions. And
there is a prediction, you know, you heard some very
(26:33):
prominent people saying that there could be the trillion prompt
and the power scales quadratically with the prompt, so that
points to a consumption of energy and GPU power which
is going to continue to raise exponentially. At the same time,
we've seen great results with optimization techniques such as quantitization,
reducing from sixteen bits to eight bit to four bit precision,
(26:56):
having even smaller models using what's called window that which
means that you know that you can only pay more
attention to some of the parts of the context intell
of all of it, and so you need a smaller one.
And so I'm seeing those two kind of going into
two opposite directions. It's going to be very interesting to
see how that evolves. I would say for the short term.
(27:17):
I see that definitely that trend is going to continue
to go up. And one of the things that fascinates
me the most is that from one version to another,
the most striking difference is the ability to reason and
the ability to actually come up with logical step by
step instructions or step by step chains of thought of
(27:40):
what the output is going to be. So we decided, okay,
first of all, we need to get access to the
most powerful GPUs, secondary we need to host them into
an environment that actually allows for the most optimal functioning
in terms of bandit, in terms of power consumption, etc.
And then at the same time, we've been focusing a
lot on optimizing the algorithm so that you know, we
can really got we could really get the most out
(28:03):
of that.
Speaker 1 (28:04):
Just to press you on this point, what are the
conversations actually like with cloud providers at the moment when
you're trying to get more compute or more space, more racks, whatever.
Is it maybe different for you because you were at AWS.
Maybe you can just call someone up there and be like,
we would like some more servers, or have you found
yourselves at times maybe limited in what you can do
(28:26):
by the amount of power available to you.
Speaker 4 (28:29):
Well, I wish that would be the case, but I
cannot just pick up the phone and get whatever I want.
But I think so far. I mean obviously because we
are a really good client of those companies in general,
but also because we've been very selective in the use
cases that we put in production. I have to say,
like I said before, think about that, if you look
at the consumption of resources today, those who consume more
(28:52):
resources are people that actually do the training of their
own models. Okay, and it initially everybody was trying to
do full training from scratch, which was taken like the
absolutely if that's one hundred, we do fine tuning, which
is adaptation of existing models that could be one to
one hundred or less in terms of consumption or resources.
(29:12):
So because of the techniques they were using, and because
of the fact that we decided to really focus on
fine tuning or RAG versus full training, we haven't really
hit any caps. And also have to be honest, you know,
we bought our GPUs pretty well early, so probably there
wasn't as much craziness as there is today, and so
that's turned out probably to be a good idea.
Speaker 2 (29:48):
You know, in videos huge.
Speaker 3 (29:49):
Everyone would like to have some of in Video's market
cap be their market cap. I have offering some cheaper product.
We interviewed some guys who have a semiconductor started that's
just going to be LLLM focused startups. We know that Google,
for example, has TPUs their own chips. Can you envision
as a roadmap some alternative where GPUs are not the
(30:12):
dominant hardware for AI?
Speaker 4 (30:14):
Well, that's literally like you know the trillion dollar question.
Speaker 2 (30:17):
Yeah, well that's I'm asking you.
Speaker 4 (30:18):
Yeah, but I'm not an analyst and I'm just a technogy.
Remember I'm the guy that makes sure that I.
Speaker 3 (30:23):
Would say, you're probably a better person to ask than
an analyst because you're actually the one who's going to
be making So I'm.
Speaker 4 (30:29):
Okay, so they're going to ask it to you. So
you have to distinguish between There are actually two dimensions
that we need to consider. One is training and the
other one is inferenced. Okay, that's the first dichotomy. For training.
At the moment, there's most likely nothing better than GPU's okay,
because when you train a model, the software or Pythons
(30:50):
or whatever framework needs to see all your GPUs as one.
As a cluster, and it's not just the GPU itself,
but it's the what Nvidia has been doing a great
job at is actually to make them work in unison
with the virtualization software called Kuda, which runs on and
video GPUs, which is a extraordinary piece of software and
(31:11):
it became the standard for that. And also because you know,
the performance premium that you have on those GPUs when
you're trying to train those incredibly large models is something
that you really really want. And so the training part,
I'm pretty sure that it's going to be dominated by
GPUs for a while. But then you know, as those
models get used, obviously the pendulum swings towards inference, which
(31:34):
is the actual Now you have a model which is
a bunch of weights and you just need to calculate
a bunch of matrix multiplications on that. I think accelerators
and specialized chips are actually going to have a really
big role to play. So you may imagine that you
go from a world where everybody builds the cars and
not too many people drive the cars to a world
(31:55):
where most people are going to drive cars. And then
there is another two dimensions, which is models that are
hosted by the client and models that are hosted by
a hyperscale. So, as you know today, I can take
a model like Lamma, I can put it in my
own environ, I can run it on a MacBook, or
I can run it in my own data center and
(32:16):
with my own GPUs. And given that I'm used to GPUs,
given that those are the ones that we can buy,
given that Kuda is what developers know, etc. I'm most
likely going to use that. That's a good part for
Nvidia for that. But then there is another way to
use those models, which is to have someone host them
for me and I just access them to an API.
(32:37):
That's what services like Amazon Bedrock does. You basically choose
your own model and then you serve it through them.
When you do that, you don't really know what's underneath.
You don't know if it's a VP, or if it
is an accelerator, if it is Amazon's own chips or
Google's own chips, etc. So now the real question, that's
why the trillion dollar question is are most people going
(32:58):
to use those models through hosted environments where the hyperscaler
will have a lot of freedom with regards to what
they use underneath, and most likely they will vertically integrate
or are they going to use them you know, themselves
in a more more like you know, in a self
service way, And in that case it's less likely that
those accelerators are going to dominate. We currently are in
(33:20):
a sort of a you know, balanced way because we
have our own that we use like I described, and
also we use you know, the hosted models. And so
where is this going to go? It's hard to say,
because I think it depends on the evolution of the models,
and it depends which models are going to be made
available as an open source that you can actually host yourself.
And I think right now one of the greatest questions
(33:42):
is are the open source models are going to be
in absolutely on parer alternative to the to the hosted model,
to the to the foundational proprietary models, and that given
Glama three point one, that answer seems to be more likely.
Speaker 1 (33:56):
Yes, I had a question about this actually, which is
do you think Wall Street's attitudes towards open source have
changed over time? And the reason I ask is because
nowadays it seems like a fact of life. Everyone uses
open source, whether you're a Goldman or somewhere else. But
I remember, you know, like back in as recently as
(34:16):
like twenty twelve. I remember Deutsche Bank had like this
open source project called the Loadstone Foundation, where they were like, oh,
we should all stop wasting our own resources developing our
own code and our own software. We should all pool
our resources together and do open source. And they had
to actually lobby. It was unsuccessful ultimately, but they were
(34:38):
trying to get all the banks on Wall Street to
work together for open source. Nowadays, it seems like there's
been this significant cultural shift, it's not even a question.
Speaker 4 (34:47):
So in general, my direction, my guide as to you know,
my team is a don't build anything unless you have to.
Don't think that just because you're a smart person you
can build software better than anybody else. Maybe you can,
but it's a good thing that we focus on building
things that are actually differentiating for us. And then I
(35:09):
think the use of open source software, which we very
much endorse, is also really good hedge with regards to
you know, which vendors to use, because it really heavily
reduces the vendor lock in. Of course, open source software,
as you know, is a tremendous long tail. There's millions
of that, and so I think there are best practices
(35:29):
around the use of open source, and those best practices are,
you know, like you know you need to run reviews
on open source tech or tech risk reviews or security
reviews or anything as I've almost built it yourself. And
then secondly tending to concentrate on the larger, very well
supported by the community type of open source. And so
(35:50):
my philosophy is yes to open source, but then you
need to own it in in truest way because you
are actually going to be generally the one that actually
needs to support that as or really building knowledge around that.
Speaker 1 (36:02):
And now you can ask AI to run the code
for you and check it.
Speaker 4 (36:06):
For yeah, okay. That of course leads to probably what
if you ask everybody where did you get so far?
The biggest bank for the back for AI? Most CIOs
are going to tell you on developer productivity. And I
think it's something that for us was the first project
that we actually expanded at scale. I have to say
that today virtually every developer in Goma SACS is equipped
(36:27):
to with generative coding tools, and you know we have
twelve thousand of that. So we didn't enable yet the
ones that are using our own proprietary language called slang,
but everybody else has an AI tool and the resulso
be pretty extraordinary.
Speaker 2 (36:41):
How do you measure that? What are what are some numbers?
Or how would you describe the right?
Speaker 4 (36:44):
So we measure it according to a number of metrics,
such as the time that it takes from let's say
when you start the sprint, when you actually commit the code,
or when you complete your task. We measure it by
number of commits, meaning how many times you actually put
code into production. We measure it by a number of defects,
which in this case is like, for example, deployment related errors.
(37:05):
So there are more like velocity and quality metrics. At
the same time, we have seen a wide range ranging
from ten to forty percent productivity increase. I would say
that today we are probably on average seeing twenty percent. Now,
developers don't spend one hundred percent of their time coding.
(37:26):
They maybe spend fifty percent of their time coding. So
your question is what are they doing with half of
their times where there is a lot of other activities
such as documenting code, such as doing deployment, doing deployment scripts,
doing you know, buntio tests, et cetera, et cetera. So
what's called generally the software development life cycle. Okay, and
so we see net of ten percent. But then the
(37:49):
cool thing is that those AIS and the things that
we're building around that are starting to go beyond coding.
They're starting to help you write the right tests, write
the right documentation. They are even figure out algorithms or
even for example, reducing or minimizing the likelihood of deployment
issues writing deployment scripts for you. So as that expands,
(38:10):
we're going to be closer to one hundred percent, and
therefore we're going to be closer probably to twenty percent,
which you know, for an organization of our side, is
a pretty massive efficiency play.
Speaker 3 (38:18):
Can I ask a question about hiring developers? So I've
probably read one hundred articles over the years about Wall
Street competing with tech companies to hire developers, like, oh,
they got a ping pong Lloyd Blank.
Speaker 1 (38:28):
Fine used to say, they are a technology company.
Speaker 3 (38:30):
Yeah, you gotta have your ping pong tables and your
free lunches and let people are sneakers and I have
all that stuff. But now it seems with AI, there's
a number of people interested in who are truly believing
that within a few years they might build the digital
god that's ten thousand times smarter than any human, and
that they approach the task with messianic fervor. And I
(38:50):
imagine it, right if you're at Goldman and you're trying
to help a banker answer a question to a client
about something in the chemical industry, like maybe that's not
like the thing that gets you out of bed the way,
sort of like metaphysical realms about what is the nature
of consciousness and things like that that people talk. Does
that present any challenges or anything when trying to hire
(39:13):
talented a developers.
Speaker 4 (39:15):
I think developers love to solve real problems. And one
of the things also that attracted me in the first place,
Not that it matters, but I'm saying, you know, I
tell you my own personal experience is that working in
a technology company is absolutely fantastic, but you're always like
one step removed from the business or from the application.
So I have to you know, let's say you are
(39:36):
the bank and I'm the technology company. I need to
sell you a tool that then you're going to use
to run your business or improve your business. We are
kind of one degree of separation. Less I were right
there in a digital business there is fast, huge amounts
of data, huge amounts of flaws, immediate results, and that's
kind of addictive. And so developers, especially when a AIS
(40:00):
are starting to do all those magical things that we're
talking about, you know, they can see the impact on
the business right away, and then I think is kind
of attracting a lot of people. In fact, that there
is more and more people that are moving into the
industries oil and gas, transportation, chemical, medical, finance because you know,
this is new and there's nothing more exciting than seeing
(40:22):
it in action. And so there is so much action
going on that I think is actually really really interesting.
I think another question that maybe you haven't asked me,
but it's kind of part of this question, is what
kind of developers? How is the profession of being a
developers actually changed?
Speaker 1 (40:35):
Oh wait, I had a related question. It's not quite
that question, but you can certainly answer that too. But Okay,
to my knowledge, Goldman Sachs doesn't have a job title
specifically with the words prompt engineer in it. So, looking
at the impact of AI on your business overall, is
AI a net hiring positive or a net hiring negative
(41:01):
for gold Men's employees overall?
Speaker 4 (41:05):
Well, meaning, are we going to hire more or less development?
Speaker 1 (41:07):
Yeah, it doesn't lead to more jobs because you're doing
more things and productivity increases. Or does it lead to
fewer jobs because now you can automate a bunch of stuffs.
Speaker 4 (41:16):
Well, listen, there is so many things that we would
like to do if we had more resources that I
think this is going to be leading to more things
that we can do. You know, some people tell me sometimes,
so you're gonna maybe hire less or have less developers.
I don't know. I've been in it quote and quote
for like literally almost forty years, and I've never ever
seen that go down. But I've seen inflection points where
(41:39):
you can actually get developers to do way more and
worry about way less. There is not related to a
business outcome, and so I think it's more like how
the profession is going to change. In my opinion, we're
going to be less low level and more Hey, I
need to really understand the business problem. Hey, I really
need to think outcome driven. Ay, I need to have
(42:02):
a crisp mental model and I need to be able
to describe it in words. So the profession is going
to change, and there are tasks that I think are
so repetitive that the automation of those is actually going
to help developers, you know, really kind of feeling really
really connected with the business and with the strategy, and
that will attract people that are generally curious, that are
(42:24):
generally interested in understanding what we actually do. So the
focus kind of shifts from the how to do what
and to the why, which is really kind of the heart.
Or think of this evolution of technology over the years
from the back office of it, which doesn't even know
what you're doing, but as long as your monitor is
actually working to hey, I'm actually able to take a
(42:44):
business problem and break it down into pieces that then
even an AI can write code for. So to your
specific question, I think this might maybe potentially for some
companies are going to try to realize some of those
efficiencies by curbing the growth or even sometimes reducing it.
For companies like us that are extremely competitive, for companies
(43:05):
that have lots of ambition, this race at the end
of the day, and I think we're going to go
for you know, trying to get even more out of
our developers and actually like you know, trying to turn
them more into something that makes them feel super super connected.
To the business.
Speaker 3 (43:19):
What about non developer roles, non tech roles, And you know, again,
I guess a company like Goldman doesn't have you know,
probably a lot of like low level customers support things
for in a window is like oh, I need to
change my plane ticket, et cetera. But you know, a
lot of modern work is essentially just answering somebody's basic question.
Are the roles within a bank that are going to
(43:41):
either fundamentally change or go away due to sort of
agentic or generative AI.
Speaker 4 (43:48):
I think a lot of the work there is about
content production or content summarization will actually be streamlined quite
a bit, like, for example, taking an earnings report, making
it into ten different sauces in order to wear for
different channels of distribution. Here's the one for internal people,
here's the one for the client, here's the one for
the website, et cetera, et cetera. Imagine the creation of
(44:11):
pitch books for clients where you take ten plates, you
put a bunch of data, you go out and do research,
you take logos, you take this, you take that. There
is a lot of that machinery and factory, which you know,
we have thousands of people doing that I'm sure there's a.
Speaker 1 (44:24):
Lot of junior analyst who would be maybe glad to
hear that some of making a pitchbox is going to
be on.
Speaker 4 (44:29):
But I think that's a good thing. It takes away
some of the toil. And so I think at the
end of the day, listen right now, have you noticed
that everything is kind of converging to words and concepts,
no matter if you're a developer, if you're a knowledge worker,
those jobs are candle colliding. And I'm absolutely developers have
seen that first. Why well, because it's a low hanging fruit.
(44:51):
The developers deal with the vocabulary. There is no fifty
thousand words. There's like two three hundred keywords for language,
and so of course that works extremely well, and of
course that's the first thing to go. But I think
eventually the knowledge worker is going to be, you know,
the one that is really benefit and no matter if
you are a developer or or or if you are
working on a pitch book, or if you're working on
a summarization of a meeting or the action items, or
(45:14):
you're working on a strategy, et cetera, et cetera. And
I think overall this will elevate the quality of the work,
which then everybody says a happy worker or a happy
developer is a productive developer. I think you're happy when
you're actually doing something that allows you to do your
best work. And I'm hoping that if AI allows all
of us to do more of our best work, I
(45:36):
think it's going to be, you know, probably the biggest
effect that we can have.
Speaker 1 (45:39):
I know, we just have a couple more minutes. So
one very quick question, what makes a good prompt?
Speaker 4 (45:45):
Well, believe it or not. Empathy. You need to be empathic,
and you need to be gentle, and you need to
be kind, and you need to kind of, you know, just.
Speaker 1 (45:55):
Like empathetic.
Speaker 2 (46:00):
She makes fun of me for how empathement.
Speaker 1 (46:01):
You know, I've said, it's very sweet that you say.
Speaker 4 (46:04):
You need to take the AI literally by the hand
and take it where you want to go. And I
tell you that, you know, one of my interesting, more
interesting experience with prompts is the following. You know how
hard it is to get an AI to say I
don't know. It's almost impossible. You're always going to get
an ass And so one time I decided I want
to get it to the point, and so I had
(46:26):
to navigate the prompt and the AI to understand that
it was safe and okay to say I don't know.
And so then at the end I prompted it, what's
the capital of you know, the United States? Okay? And
then I said that you know, what's the weather going
to be like tomorrow? And I got an answer, and
then I said what's the weather going to be in
(46:46):
a year, and it's simply I don't know. And then
at one point, you know, I even decided what to say.
It's like, is there a role for humans in a
world of a eye?
Speaker 2 (46:59):
I don't want to know?
Speaker 1 (47:04):
Okay, Well, everyone's going to be off on chat GPT
now trying to get it to say I don't know.
Marco Argenti from Goldman Sachs, thank you so much. That
was good fun.
Speaker 2 (47:12):
Thank you, Johing, thank you so much.
Speaker 1 (47:14):
Thank you so much, Joe. That was a lot of fun.
And I have to say I do not make fun
of you for saying please and thank you to Chat GPT.
I have I'm going to repeat it. I've said it's endearing,
(47:35):
it's very sweet, and I've tried to follow your example.
And I now I don't say thank you because I
usually move on to the next question. But I do
say please.
Speaker 2 (47:43):
I've heard this though.
Speaker 3 (47:44):
It's funny that you said that, because I actually have
heard this that there does seem to be quantitative evidence
that words like please and thank you, et cetera do
actually improve really well, yeah, mad Buseegan, who you know
we've known on Twitter forever, has posted about this. So
there's a good reason to do it besides just the
(48:06):
habit the all entities you talk to, you should be
in the habit of flight.
Speaker 1 (48:09):
Oh yeah, that was your argument, right, yeah, yeah, yeah, Okay,
Well I thought that was fascinating.
Speaker 2 (48:13):
Yeah.
Speaker 1 (48:14):
We've been talking a lot about AI and the sort
of potential use cases and the chips that are driving
the technology and things like that, but it was nice
to hear from someone who's actually making the purchasing decisions, yes,
and implementing them at a large institution.
Speaker 2 (48:28):
Absolutely.
Speaker 3 (48:29):
That was probably one of my favorite AI conversations we
had for precisely that reason, because it was interesting hearing
him talk about this idea that right now, like these
open source models, particularly like the latest version of LAMA,
is getting really close to sort of the core proprietary models.
That was striking the fact that he sees, perhaps particularly
(48:50):
on the inference side of model usage, an opportunity for
greater use of different types of hardware.
Speaker 1 (48:58):
Also very interesting, that's right, And we're so used to
talking about the massive amounts of power and energy that
AI will consume, and we you and I have had
a lot of conversations about how we're going to power
all these servers and things. But what's gotten far less
attention is just optimizing the way you use AI such
that you don't need to consume as much power, So
(49:20):
maybe doing less training, leaving training to the big like
hyperscalers or whatever, and then just doing the inference.
Speaker 3 (49:27):
In the end, it's going to be both, right, because
in the end, like there's both, it's going to happen.
People are gonna find algorithmic techniques and Marco described some
of them to lessen the sort of pressure and stress
that you're putting on the hardware, but of course that's
just going to mean you're going to use it more.
And then also people are going to have to solve
the power consumptions. That kind of like all of economic
(49:49):
history in general, in which we're always finding new ways
to get more out of the same you know, gigajewel
of energy but also using more energy at the same time.
Speaker 4 (49:59):
Yeah.
Speaker 1 (50:00):
Absolutely, well, shall we leave it there.
Speaker 2 (50:02):
Let's leave it there.
Speaker 1 (50:03):
This has been another episode of the aud Thoughts podcast.
I'm Tracy Alloway. You can follow me at Tracy Alloway.
Speaker 3 (50:09):
And I'm Jill Wisenthal. You can follow me at the Stalwart.
Follow our producers Carman Rodriguez at Carman Ermann dash O,
Bennett at Dashbot, and kel Brooks at Kelbrooks. Thank you
to our producer Moses ONEm and from our Odd Lots content.
Go to Bloomberg dot com slash od loss. We have transcripts,
a blog, and a newsletter, and you can chat about
all of these topics in our discord where we even
(50:30):
have an AI channel. Great stuff in their discord dot
gg slash Odlins.
Speaker 1 (50:35):
And if you enjoy Odd Loots, if you like our
continuing series of AI conversations, then please leave us a
positive review on your favorite podcast platform. And remember, if
you are a Bloomberg subscriber, you can listen to all
of our episodes absolutely ad free. All you need to
do is connect your Bloomberg account with Apple Podcasts. In
order to do that, just find the Bloomberg channel on
(50:57):
Apple Podcasts and follow the instructions there. Thanks for listening.