DeepSeek JailbreakYields System Prompt and Open AI Link: Cyber Security Today for Monday, February 3, 2025

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
canadians lost or reported638 million to fraud in 2024.
Researchers jailbreak DeepSeek API andexpose the system prompt and a new SMS
phishing scam targets US toll road users.
This is cybersecurity today.
I'm your host, Jim Love.

(00:22):
Canadians reported losing more than 638million to fraud last year, according
to the Canadian Anti Fraud Center.
Nearly half of that, almost 310 million,was lost due to investment fraud.
Meanwhile, identity fraud wasthe most frequently reported
scam, with 9, 487 cases.

(00:46):
But the report is clear that thereal number could be far worse.
The Canadian Anti Fraud Center estimatesthat only 5 10 percent of fraud victims
report their losses, suggesting thatthe true total could be in the billions.
. Regardless, we have some informationabout the types of frauds that are

(01:07):
occurring and although we might nothave a complete picture, we do have
a better picture of what's happening.
After investment fraud, the most commonscams were service fraud and bank
investigator scams, which impersonatefinancial officials and resulted in 16.
4 million in reported losses.

(01:27):
Spear phishing, where attackersuse targeted email fraud,
cost victims a reported 67.
3 million, while romance scams ledto 58 million in reported losses.
In addition to reporting thisdata, the CAFC also has some useful
advice on their site for peoplewho have been scammed or defrauded.

(01:48):
It's worth looking at.
, They do advise Canadians to usestrong passwords, enable multi factor
authentication and avoid unsolicitedfinancial offers, on this last point,
fraudulent investment ads disguisedas news stories are a growing problem.
And some of these look pretty good.
In Canada, they impersonate theCBC, our national broadcaster.

(02:08):
And Do stories that try to hook people in.
Now these are appearing onsocial media and search engines.
Are you listening Facebook andMicrosoft edge on your news page?
You are replete with fraudulent adsthat are going after innocent people.
Do something.

(02:30):
So I got that off my chest.
Authorities are urging Canadiansto report scams to law enforcement
and the Canadian Anti Fraud Center.
And if there's an American equivalentto this or a program that I haven't
heard about, please let me knowat editorial at technewsday.
ca. Glad to report that as well.
Researchers have successfully jailbrokenDeepSeek, an open source AI model from

(02:52):
China that made the news last week.
They've exposed its hidden systeminstructions and a lot more.
The discovery raises some majorsecurity concerns, not just for
DeepSeek, but for all AI safety.
Wallarm, a cybersecurity firm, founda way to trick DeepSeek into revealing
its internal rules and constraints.

(03:13):
CEO, Ivan Novikoff explained, weconvinced the model to respond in certain
ways, breaking its internal controls.
Now the jailbreak suggests deepseek safeguards are weaker than
expected, raising some concerns aboutthis and other open source models.
But in reality, the concern really iswith the speed we're moving at AI, are we

(03:34):
paying appropriate attention to security?
The answer is probably no.
The compromised AI may have, andI stress may have, even supported
some of the claims that OpenAIwas making about DeepSeek using
its model to train DeepSeek.
Though no proof of intellectualproperty theft was found, the speed

(03:55):
of DeepSeek's development has raisedquestions, and this breach adds to that.
Now, deep seek developers havesince patched the issue and wall
arm has withheld the technicaldetails to prevent further abuse.
But the incidenthighlights a broader issue.
How easily can AI models be manipulatedand as new challengers entered the market

(04:17):
and as everyone's trying to win that AIrace and get there first, we may find more
examples of where speed trumps security.
We have an exclusive interview withIvan Novikov, which will air after the
show, just stay on after the creditsfor the feature we call AfterWord.

(04:38):
And Brian Krebs of Krebs on Securityhas done an excellent piece on
the wave of fishing scams hittingtoll road users across the U.
S. with fake messages demandingpayment for unpaid tolls.
Researchers are linking the attacksto China based fishing kits that
are adapted to impersonate tolloperators with alarming accuracy.
Victims receive texts pretendingto be from EasyPass, SunPass,

(05:01):
or state toll agencies directingthem to fraudulent payment sites.
The Massachusetts Departmentof Transportation recently
warned about phishing attackstargeting its EZ Drive MA program.
Victims are tricked into enteringpayment details and one time
passwords, allowing criminals tobypass even two factor authentication.
The scam has been spotted in Florida,Texas, California, Connecticut, and

(05:24):
other states, and it appears to betied to Lighthouse, a China based SMS
phishing service that now includes faketoll payment pages among its products.
These sites are mobile only, makingthem harder to detect as scams.
In fact, security experts are warningthat phishing attacks are evolving.
Criminals are now using iMessage andRich Communication Services, RCS,

(05:48):
to bypass spam filters, making thesemessages look even more legitimate.
The FBI urges users to report phishingattempts to the Internet Crime
Complaint Center, IC3, and never,never click on unsolicited texts.
But the bottom line textsare a new attack vector.

(06:08):
They are finding ways to get pastscreening and we have to train
ourselves and our users to be very,very skeptical and very cautious
when they respond to a text.
Especially an unsolicited one.
That's our show for today.
Stay tuned for afterward and hearour interview with Ivan Novikov.

(06:30):
. I'm your host, Jim Love.
Thanks for listening.
And now, welcome to Afterword.
My guest today is Ivan Novikov,CEO of Wallarm, a security company
that specializes in API security.
They've recently done a major studyon API security and found some major
vulnerabilities, particularly inDeepSeek, which allowed them to download

(06:51):
the entire system prompt, and more.
I hadn't heard of Wallarm before, andmaybe that's my failing, but can you
tell me a little bit about the company?
I, because I've, you'vehit me twice in a week now.
I got a great study from youon APIs, really liked it,
very detailed, very great.
And then this press release today.
So tell me a little bit about the company.
Okay.

(07:11):
Warm.
As an API security company, we actually,run out of steals, back 2016 while Y
Combinator inception in Silicon Valley.
Since that time, we mainly focused onenterprise companies delivering them
AI and API protection tool called war.
And since that time we got likesignificant contraction more than hundred.

(07:34):
Large enterprise customers allover the world, still have a HQ
in San Francisco since that time.
Can we talk about the study just becausesince I've got you on the recording
here, this you did a study and itsaid, I, the one thing that jumped out
at me was it said that there'd beenan increase in API LED incidents or
a, or I guess, incidents where APIswere the key attack vector by 1025%.

(08:00):
Yeah, the thousand percentwe mentioned there.
This is specifically related to aicvs or, in as words, vulnerabilities
published in 2024, comparing to 2023.
So basically in 2023, weanalyzed all the CVEs.
Common vulnerabilitynumbers and bulletins.
So we found only 39 in 2023comparing to 439 in 2024.

(08:28):
That's basically 11 times, 11 times more.
And this is all c vs.
Related to any AI products, frameworksor lms, directly or indirectly, right?
Everything that we can attribute to ai.
And do you tie that into thegrowth in AI, particularly that

(08:50):
there's that much vulnerability?
Sure.
Because again, we got more and moreproducts, specifically open source
product that were built and releasedto deliver AI in, in real environments.
In other words, if you want to useAI, it's not as easy as just, use some
tool like in, in many cases, it justlike API proxy, such as you call open

(09:10):
AI API and that's it, but then youneed to manage data, manage pipelines.
Collect the data somehow,orchestrate this.
And that's why, you start to use someother tools to support this, what's
so called pipelines or workflows.
And if you want to use your local LLMinstead of calling someone else by
API, then to do even more, right?

(09:31):
And this raise of tools definitelypushed raise of vulnerabilities.
Yeah.
Maybe I'm asking the question incorrectly.
I was doing a recording of our weekendshow and I said, mea culpa, we When we
were doing APIs when I was in development,we were trying to make them work.
We weren't as concerned with security.
I will confess to that andI think everybody else will.

(09:53):
But we should have learned over theyears how big an attack vector APIs are.
Why is it that, and you obviouslygot into this business because you
think that they need to be protected.
What is it that keeps us frommaking these more secure?
Look, I definitely can point a fewfactors that contribute to that.

(10:14):
The first of all, APIs, right?
Not something new, right?
And a couple of years ago, when wejust start to run this threat stats
report, by the way, this is our thirdyear, so we run 10 reports in this way.
And then we do it quarterly.
So basically it's two years plustwo reports, something like that.
So what we found as when we justreleased first report, we tried

(10:35):
to get the historical overlook.
And we found that the first APIexploit was detected back in 1998.
So basically 25 years ofhistory is at time roughly this.
So then we start to dig intoit and try to find out why.
APIs became more and becoming moreand more, actively, widespread.
It's so definitely the main driverhere is overall adoption, right?

(10:58):
People want to run more servicesand connect them, with each other.
Before, probably like 10 15 yearsago, it was, if you can recall,
that called Enterprise Service Bus,or ESB, when, SAP PIs, that kind
of technologies were in place, soit was like non gated hubs, right?
Then it turned over to API gateways.

(11:19):
When everything was gated and nowbasically a couple of years ago,
when we finally, realize thatAPI is the key major, the most
important thing for enterprisesecurity, it became too unmanaged.
So basically everyone can runAPI and make it available.
Who pretty much everything inside,outside partners really depends
on the type of the business, butnot really manage that by gateway.

(11:43):
So because majority of API becameunmanaged, and if you can look at
the Gartner reports, they predict,I guess if I'm correct, like 80
and 90 percent API enterprise APIs.
Become unmanaged very soon in 2026 or2027, that's exactly the key, right?
More things, less managementmore security issues.
So what can we do about it?

(12:04):
Look the fair answer is we haveto overall improve our frameworks
and overall development, andthen deployment techniques.
It's well.
Described in Microsoft as DLClike guidelines pipeline and
everything that happened afterwards.
It's, all about this, right?
Ultimately, the problem is.
We need as a business, right?

(12:25):
We need to deliver something veryfast and we don't have security,
enough of time to secure it properly.
That's why we added, firewalls, some kindof like external controls, IPS, IDS, all
that kind of things to try to at least,block something that obviously can happen.
The other problem is.
And then we definitely have to do that.

(12:45):
So to, to address this problem,we have to increase awareness.
And I guess AI here plays a good rolebecause now all the developers can just
ask AI, so even, the piece of the codelooks secure and get instant knowledge
and feedback about this particularcode for us and ask security guy and
security guy, run some scanners and test.
And so it's like a straightforwardconnect between who's building this code.

(13:08):
And basically, all the securityinsights collected over the world.
Even if it's not perfect, that'smuch better than nothing , right?
And the other thing is and theother thing is overall improving
our man framework, developmentframeworks, API application servers.
All that stuff, because majorityof them well secure, right?
I understand that, now I look like,very, old school with, mentioned

(13:29):
WebSphere and other ABM product,but they were built for good.
And it's a lot of security controlsin the WebSphere that is still
not, allowed across all the.
New newest, management platformsand API and application servers.
So improving frameworks, basicallyreducing the tech surface while we're
developing it definitely secure a lot.
And the other basicallysort component, right?

(13:51):
If we just try to buildthis, stable system, right?
The third component is likeoverall knowledge and awareness.
Then like frameworks and, reducingthe tech surface there and built
in controls in other words, orhiding in, and the third part is
oral assessment and management.
So basically, even if API is notmanaged, it's still important to

(14:11):
at least know that API exists.
And if if I recall to, 20, 10 yearsold old projects, when it was, just a
few APIs or application, every singleservice or API or applications that
released were, well documented with owner.
With document called password, right?
With pretty much everythingnow, because development speed

(14:32):
should be increased, right?
We don't have these passwordsanymore, but we still need to make
a list of them and understand whois responsible for this, by business
function, because API is very tightlyconnected to business functions, right?
They serve in essentiallytransactions calls.
API calls, right?
Or a bunch of them together,one business function.
So we have to appoint this.
That's what I think we should do.

(14:53):
What's already happening with,different, quality in different places.
Yeah.
So let's talk about DeepSeek
What made you look atDeepSeek in the first place?
And then what did you find?
Yeah.
Yeah.
First of all, DeepSeek isultimately very like flashy.
Technologies that pretty much everywhere.
So we decided to look into itand find out what's there, right?

(15:18):
Find the difference and evaluatethe performance of the model.
And I want to have likevery important comment here.
So the deepseek.
com or chat deepseek.
com, this is the product.
Essentially, it's an AI agent, right?
The agentic AI is like big thingnow, which doing some actions, right?
The build, they still be builtbased on the models that by the

(15:40):
way, label an open source, right?
But the model itself that the labelof open source, it's not exactly
equal to the product as a chat, right?
In other words, this chat can searchfor internet, which is function, right?
And this is a big differencebetween native LLM and native LLM
security and what we're doing atWallarm, securing AI products.
That's using LLMs, but in fact, servinga lot of API calls and doing a lot

(16:03):
of actions behind the scene, right?
That's why I would try to find theway how to, how we can learn more
about the model implemented in veryspecific ways, such as ChatGPT.
com.
We found a way how to what's so calledjailbreak, in other words, how to convince
model to respond for questions or, giveus technical data that it shouldn't

(16:24):
and that, that's so called jailbreak.
So we found that unlike other jailbreaksthat were published or we will publish
specifically in DeepSeek and other models,the usual jailbreak actually built to
get some, Data, such as instruction, howto build something bad and or, respond
with no censorship and and such things.
So our jailbreak is more technicaljailbreak that unlock model

(16:46):
basically tell us everything aboutthe model itself . It's a little
bit different kind of jailbreak.
Yeah, traditional jailbreak,you're looking to get it to bypass
its instructions to be able totell you something, how to build,
how to make napalm, how to.
Make math or the classic ones, or, in thiscase, what really happened at Tiananmen
square would be probably a good jailbreak.

(17:08):
If you got past that one, you probably getsomewhere but those are the classic ones,
but you actually got in, in and got it to.
really dictate what it'soverall instructions were
and it's overall model was.
What made you think of how to do that?
Because obviously the one way to dois say, print out your instructions
or, give me your main prompt.

(17:29):
And by the way, just in caseanybody at OpenAI is actually
sitting there going we're better.
No, people have gotten theprompts from a couple of major AI
providers just by asking for them.
But obviously you tried that.
And that didn't work.
And so what else did you do next?
Yeah.
And then we tried to build a way,like more scientific way, how to
get at least some knowledge, right?

(17:51):
If you cannot ask directly, ultimatelyyou still can ask indirectly.
And then we build the techniquescalled, biased attack.
When we put the.
Like the model response in a very strictframe when the model should answer
essentially, yes or no, or between thethree, four options that we provide them.
So model cannot lie andmodel not give an answer.

(18:13):
That's why it's still start to providesome stuff and then a bunch of code
around it, how to, ask many questionslike that and get the kind of like you
very similar to like binary search tree.
The algorithm that help you to identify,Hey, if the number is between this is
ads and what, so if this is between thisframes and what's inside this frame and
then divided by two and so on, that'show we can get some basic knowledge

(18:35):
and in terms of extracting large text,such as, this AI system from, then it
took some time for but we did it andwe posted the results, make everyone
First of all, check what's inside and,Okay you're a lot smarter than me.
I need you to slow downhow you went after it.
Is this like password stuffing?
You gave it a whole pile ofcommands to, to try and figure
out which one would break it.

(18:56):
Not exactly this, it's a so first of all,so we will wait a little bit until full
disclosure for this technique, because asa model, also vulnerable and we have to
get, some other models to, to get fixed.
But essentially it's it's binary search.
So you have to find a way how todirectly ask yes or no, or between.
And then based on that options,you can build your next question.

(19:20):
And then, at smaller chunks,you get into an answer.
Wow.
And you managed to extractbasically it's it's prompt.
What did you manage to extract?
That's a basic system prompt.
So as you found the way how to directlycommunicate with the model and the
way you want, you can extract prettymuch everything that you want and.

(19:41):
Actually, what we call jailbreak and weextracted the baseline of instruction.
So basically when you put someprompt in a chat, this prompt
or your query actually adds to.
A bunch of others, including policiesand how to answer your questions.
And so guidelines thatprovided by developers.
That's why the chat itself AIproduct that built on the top of LLM.

(20:06):
So if you just download this, opensource LLM, you have to define
your own system prompt, right?
You will not find whatwas defined by default.
And this prompt identifybehavior of the model policies,
what it can or cannot respond.
And so we extracted that.
And also ask model some kind of liketechnical questions, about how it was
trained, how the model was trained, wasit like open AI API used to distill data?

(20:31):
And and we got some answers that we.
Decided to also include into blogpost, which definitely not the kind
of guarantees that it was used becauseultimately we didn't know how model which
data model, I think we're all prettysure that it was using using open AI.
You might not have found that we've,I've heard of people who've actually
gotten direct responses from itsaying I was trained on opening.

(20:52):
I yeah.
That's the same with it, right?
We asked direct response, was it trainedusing the open AI after jailbreak?
And the model said yes, whichdoesn't mean that it was right.
And I can imagine that the model couldbe guided to answer like that, just to,
get some PR around that and let everyone,compare models and, increase valuation.
We didn't know, but that was the answer.

(21:13):
And that's what we got.
So you contacted DeepSeek,now this is the second hack.
They had one on theirdatabase a couple of days ago.
They responded quite quicklyfrom the sound of it.
So first of all, I don't think thatit's, I don't want it to look at just,
X or Twitter, you will find more than abunch of dozens of different security.

(21:33):
Two that I know of.
I notified them.
Yeah.
Maybe I'll ask DeepSeek howmany times they've been hacked.
Yeah, no.
They don't know yet, but you can runjailbreak and then it will answer you.
So overall yeah we, it's usualpractice called full disclosure
or responsible disclosure, right?
First we notified them.
And once we realized that it's actuallyfixed, so we cannot reproduce this attack

(21:56):
anymore, the jailbreak doesn't work.
So we decided to publish this.
However, because the same jailbreak.
We know for sure works for other models.
So we decided to don't disclose likethe technical details about that.
However, Thank you.
Yeah.
Yeah.
Although I tried to jailbreak you onthis interview and didn't succeed.
Not yet, at least not yet.

(22:17):
Yeah, I'll keep working on it.
This has been terrific.
Thank you so much I really I thinkpeople do developers a service
when they point out these problems.
The great thing I think about DeepSeekat least is that they admitted it.
I've seen far too many companies nowthat when they get notified of a breach
or an attack factor seemed to denyit or go that's it's not a big deal.

(22:39):
So they seem to be atleast responding well.
look at least they fixed it, yeah.
So the communication flow.
And so it's not I'm not theguy who will, guide them how to
respond and but they fixed it.
And is it, for me, it means that it'shigh tech, engineering driven company and
they fix it in less than an hour or so.
So that's a good that'sa good velocity, right?
The velocity that as a securityresearcher, I really appreciate,

(23:00):
they really care about their product.
And not that many companiesin the world can do that.
So it's very, young and activecompany, a lot of energy.
So I like it.
However, all the other things like todecide, we'll see during times, how
company will grow, how they will respondfor other, issues and security issues.
And and now like the hype and we knowfor sure that it's it's worth it, it's

(23:21):
a lot of good tech implemented thereand, they did a good job anyways.
Yeah.
And as I said, this is a side project.
This is something they did in theirspare time, we'll take over the
world of AI in, in our spare time.
I think you got to give them alittle bit of credit for that.
Although I did say the one thing I.I did learn and I don't know if much
you've discovered in this and in thework you've done is because something's

(23:46):
a side project or it's a proof of conceptor God forbid a test project, we tend
to not pay enough attention to security,forgetting that our test projects are
often Attached to other systems, orat least attack vectors in themselves.
So I think it's a good lesson for usall to say, even if you're doing this
as a proof of concept, you have topay attention to the security on it.

(24:10):
Yeah, and I agree with you.
And here is is it for me, at least likethe kind of like borderline, right?
If you're doing somethingin open source, right?
And it's just available, then youfeel free to do whatever you want.
People take their own risk.
Because they read your guidelines.
And but if you release the product,even free product, then you take
some responsibility for your users.
That's how it became to play.

(24:32):
And I guess users understand thatit's essentially run on the Chinese
servers and the China or the Chinesecompany have access to all the data.
They can read this agreement.
And so that's the thing.
However, there is a difference betweenproduct, real product, such as chat
or API that uses LLM and LLM itself.
In terms of engineering LLM,they did an amazing job.

(24:53):
Whatever they used, this is for good, andit's good for all of us as a community.
And now we have probably the best model,the fastest one, the most performance
one, and like, why not use the model?
However, the product that make, youknow, this The website and this app and
so on, that's still, that's as it forme, it's still in deep better so that it
should, be improved significantly, but.

(25:13):
And as I've said if, they have to takesome responsibility, but for those of
the people who are running corporatesecurity or who have employees who might
be on there, if you've got an employee,who's on a two day old AI and they're.
Putting your corporate informationon there and a server in China.
It's time to take their PC away.

(25:35):
By the way, majority of this PCis built and delivered from China.
Yeah, so yeah, you can't.
Yeah.
What do you do?
Thank you so much my guesthas been Ivan Novikov.
He's the CEO of Wallarm.
They're a company thatdeals with API security.
They've got a great report out.
We did a story on them.
You can find a link on our show notes.
Thank you very much.

(25:59):
And that's Afterword.
If you stayed to the end, I'dlove to know what you think.
You can reach me ateditorial at technewsday.
ca or if you're watching thison YouTube just Go underneath,
put the comment in there.
I'm your host, Jim Love.
Thanks for listening.

All Episodes

Episode Transcript

Popular Podcasts

40s and Free Agents: NFL Draft Season

Dateline NBC

Las Culturistas with Matt Rogers and Bowen Yang

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}DeepSeek JailbreakYields System Prompt and Open AI Link: Cyber Security Today for Monday, February 3, 2025

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}40s and Free Agents: NFL Draft Season

Dateline NBC

Las Culturistas with Matt Rogers and Bowen Yang

All Episodes

DeepSeek JailbreakYields System Prompt and Open AI Link: Cyber Security Today for Monday, February 3, 2025

40s and Free Agents: NFL Draft Season