All Episodes

May 8, 2025 48 mins

Send us a text

This week on Sidecar Sync, Amith Nagarajan and Mallory Mejias explore Wells Fargo’s virtual assistant “Fargo” and how it stacks up against Klarna’s AI tool from a year ago. With 250 million fully automated interactions and measurable impact on customer engagement and bias reduction, Fargo offers a powerful case study in applied AI. Amith reflects on what’s now possible for associations, why a narrow pilot project is a smart first move, and how “human in the loop” isn’t just a safety net—it’s strategic. The duo also breaks down Microsoft’s new Phi-4 reasoning models, which pack PhD-level performance into incredibly compact packages that can run on your phone. If you're wondering where the AI trend line is heading, this one’s for you.

🔎 Check out Sidecar's AI Learning Hub and get your Association AI Professional (AAiP) certification:
https://learn.sidecar.ai

📕 Download ‘Ascend 2nd Edition: Unlocking the Power of AI for Associations’ for FREE
https://sidecar.ai/ai

📅  Find out more digitalNow 2025 and register now:
https://digitalnow.sidecar.ai/

🎉 More from Today’s Sponsors:
CDS Global: https://www.cds-global.com/
VideoRequest: https://videorequest.io/

🛠 AI Tools and Resources Mentioned in This Episode:
Fargo ➡ https://sites.wf.com/fargo/
Klarna AI Assistant ➡ https://www.klarna.com
Microsoft Phi-4 Reasoning Models ➡ https://huggingface.co/microsoft

Chapters:

00:00 - Introduction
03:47 - Meet Fargo: Wells Fargo’s AI Assistant
05:59 - Comparing Fargo with Klarna’s Assistant
08:57 - The State of AI Agents in Associations
13:05 - Event Support: A Smart Use Case for AI
15:00 - Human-in-the-Loop: Not Optional, But Essential
23:44 - Private AI: Local vs. Cloud Deployment
26:46 - Microsoft’s Phi-4 Models: Small and Mighty
32:50 - Why Small Models are a Big Deal
43:54 - AI Trendlines and the Future for Associations

🚀 Sidecar on LinkedIn
https://www.linkedin.com/company/sidecar-global/

👍 Like & Subscribe!
https://x.com/sidecarglobal
https://www.youtube.com/@SidecarSync
https://sidecar.ai/

Amith Nagarajan is the Chairman of Blue Cypress https://BlueCypress.io, a family of purpose-driven companies and proud practitioners of Conscious Capitalism. The Blue Cypress companies focus on helping associations, non-profits, and other purpose-driven organizations achieve long-term success. Amith is also an active early-stage investor in B2B SaaS companies. He’s had the good fortune of nearly three decades of success as an entrepreneur and enjoys helping others in their journey.

📣 Follow Amith:
https://linkedin.com/amithnagarajan

Mallory Mejias is the Manager at Sidecar, and she's passionate about creating opportunities for association professionals to learn, grow, and better serve their members using artificial intelligence. She enjoys blending creativity and innovation to produce fresh, meaningful content for the association space.

📣 Follow Mallory:
https://linkedin.com/mallorymejias

...
Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
The most important thing for all of you
associations to note is that youhave options.
You have ways of doing secure,private AI inference.
There's a number of ways to dothis and you can even do it
locally, on-device.

Speaker 2 (00:14):
Welcome to Sidecar Sync, your weekly dose of
innovation.
If you're looking for thelatest news, insights and
developments in the associationworld, especially those driven
by artificial intelligence,you're in the right place.
We cut through the noise tobring you the most relevant
updates, with a keen focus onhow AI and other emerging
technologies are shaping thefuture.
No fluff, just facts andinformed discussions.

(00:37):
I'm Amit Nagarajan, Chairman ofBlue Cypress, and I'm your host
.

Speaker 1 (00:43):
Greetings and welcome to to the Sidecar Stink, your
source for content at theintersection of all things
artificial intelligence and theworld of associations.
My name is Dmitry Nagarajan.

Speaker 3 (00:55):
And my name is Mallory Mejiaz.

Speaker 1 (00:57):
And we are your hosts and, as always, we've prepared
an awesome episode for you guysto get some really interesting
topics at the forefront of AIand we're going to talk all
about how they apply to you inthe world of associations.
So excited to get into that.
But first of all, Mallory, howare you doing today?

Speaker 3 (01:15):
I'm doing pretty well myself, amit.
It's a nice chilly day inAtlanta, so I'm enjoying that.
Been getting outside a lotrecently, since the weather's
been mostly warm and, yeah, I'vehad some fun auditions come
through on the acting front, soit's been a good productive
weekend for me.

Speaker 1 (01:34):
What about you Fantastic?
Well, you know I joke around alot of times when I'm in New
Orleans, which is home base forme, that that's the center of
the universe for associations.
Of course it really isn'tno-transcript.

(02:07):
Yeah, I had a breakfast chatwith somebody and have a few
more meetings lined up meetingwith some of our team members
across our company, so it'salways a productive time in DC.
It's pretty much nonstop fromearly morning till late in the
evening when I get into town.

Speaker 3 (02:22):
Yeah, I was saying to Amit before we started
recording.
I didn't know how he waspossibly going to squeeze in
this podcast with his scheduletoday of all these meetings, but
you showed up, Amit, I'm reallyhappy we're here.

Speaker 1 (02:35):
Well, episode 81,.
We got to keep the streak goingand this is so much fun to
record and I'm always interestedin making time for it.
My audio quality may not be asgood as normal, unfortunately
for this episode, so apologiesin advance if that is the case
and that is your experience.
But I'll be back to a normalrecording session shortly, but
for now I am on the road anddoing my best.

Speaker 3 (02:57):
We take the sidecar sink all over.
I don't know if we've ever doneit internationally yet.
I'm trying to think on my end.

Speaker 1 (03:10):
I don't think I've ever recorded in another country
.
What about you, Amit?
I don't believe so, but thatsounds like a challenge.

Speaker 3 (03:12):
I think I need to book a flight somewhere.
Yeah, hey, let's do it.
We'll do like maybe a Mexicoversion of the sidecar sink.
That'd be fun.
Well, today, as Amit mentioned,we have some exciting topics
lined up for you.
We're going to first beexploring this Wells Fargo AI
assistant and then doing alittle bit of a reflection on an
episode we did it was actuallyepisode 21 where we talked about
Klarna's AI assistant, just todo a little compare and contrast

(03:36):
.
And then we will be talkingabout the latest Microsoft's
PHY4 family of models with somegreat naming conventions, as we
always chat about on the SidecarSync podcast.
So, first and foremost, theWells Fargo AI Assistant is
called Fargo.
It's an advanced virtualassistant integrated into the
Wells Fargo mobile app thathelps customers with a wide

(03:57):
variety of banking tasks throughboth voice and text
interactions, from checkingbalances to processing payments
and handling refunds, and alsoproviding personalized financial
guidance.
Fargo serves as a 24-7 bankingassistant for Wells Fargo
customers.
The assistant uses a modelagnostic architecture and it
employs different specializedLLMs for various tasks.

(04:21):
So that's that multi-agentframework that we talk about
often on this podcast.
It has a privacy-first design,so no personally identifiable
information is exposed toexternal language models and
sensitive data is processedlocally before any cloud
interaction.
They're seeing some impressiveresults so far.
So there have been 245.4million interactions with the

(04:45):
assistant in 2024, which isactually double what they
projected and these areinteractions entirely without
human intervention.
So around 250 millioninteractions without human
intervention.
They're seeing deep engagementwith their AI assistant, so 2.7
interactions per session onaverage with their AI assistants

(05:07):
.
So 2.7 interactions per sessionon average, and across the
board with their AI initiativesthey're seeing a 3 to 10x
increase in customer engagementand something that's also
interesting to note.
We've talked about bias that'sbuilt into AI models because of
the material that it's trainedon.
We've also talked about biaswith humans right, because when
we're making decisions, we'repulling on all of our previous
experience as well.
Something that's beeninteresting with their AI

(05:28):
initiatives at Wells Fargo isthey're seeing some bias
reduction in certain areas, sothe AI has led to fairer lending
decisions when it comes toloans, which I think is quite
interesting to note.
Behind the scenes, pega is thecompany particularly their
customer decision hub behind allthese AI initiatives at Wells
Fargo, and it helps them analyzebillions of interactions to

(05:51):
determine the next bestconversation for each customer,
making Fargo's responses highlypersonalized and relevant across
channels and, as I mentioned inepisode, so it was episode 21.
Right now we're recordingepisode 81.
And, as I mentioned in episode,so it was episode 21.
Right now we're recordingepisode 81.
So a long time ago, 60 weeks ago, we talked about the Klarna AI
Assistant and I wanted to do alittle bit of a reflection.

(06:12):
There aren't a ton of starkdifferences, but there are a few
.
So, despite serving fewercustomers, wells Fargo has about
70 million customers.
Klarna has 150 million.
So substantially different hasabout 70 million customers.
Klarna has 150 million, sosubstantially different.
Wells Fargo handled that 250million interactions that I
mentioned in the whole year of2024, compared to Klarna's 2

(06:33):
million-ish in its first month.
They haven't published theirfull number for 2024.
But even comparatively 2million in one month if it
continued on that trend or evenconsiderably increased each
month in one month, if itcontinued on that trend or even
considerably increased eachmonth 250 million interactions
at Wells Fargo is prettyimpressive for their 70 million
customer base.
Klarna also publicly statedthat its AI assistant was doing

(06:58):
the work of about 700 full-timecustomer service agents,
handling two-thirds of allcustomer service chats.
Wells Fargo has not published aspecific equivalent number of
agents replaced.
But given Fargo's scale as faras exceeding Klarna in both
total interactions and percustomer engagement, I would say
it's reasonable to infer thatFargo automates work that would
require potentially thousands ofagents.

(07:19):
And something also worth notingis the feature evolution in
both.
So the Klarna assistant hasexpanded from customer service
to shopping recommendations,personalized shopping feed,
multilingual support and chatGPT integration for shopping
advice.
The Wells Fargo assistant hasadded AI driven spending
insights, actionable financialtips, improved money movement

(07:41):
and financial insight summaries.
So going beyond this basiccustomer service, routine
interaction and really providingfurther value to the consumer,
which I think is quiteinteresting.
So, amit, you've been talkingabout virtual assistants, ai
agents, really from thebeginning, from the beginning of
this podcast, for sure.
How have you seen thatconversation evolve,

(08:04):
particularly over the last year?

Speaker 1 (08:06):
You know it's interesting, you mentioned
episode 21 versus 81.
So it's exactly 60 episodes, orroughly 60 weeks ago.
When we talked about Klarna, Ithink back then both of us were
really excited and impressed bywhat Klarna had achieved with,
at the time, a fairly earlymodel I mean it was GPT-4, if I
recall correctly but it was,compared to what we have now, a

(08:27):
very rudimentary model, and whatthey achieved was pretty
remarkable.
And so now it is good to havethat perspective because in 60
weeks we've had roughly a littlebit over two AI doublings in
power, so a lot of fascinatingthings to unpack here.
So to your question of whatI've seen evolve feature set
increase is definitely somethingI think that makes sense,

(08:47):
because if you can engage peoplein a way that they find
pleasing, that they find useful,they'll come back more.
And if you have morefunctionality to offer, then you
can go deeper.
So you know, if Wells Fargo isable to, for example, provide
spending insights directly intheir platform, that could be
really useful for a lot ofpeople, especially if you have a
credit card and a bank accountwith Wells Fargo or maybe some

(09:10):
other things.
That broader set of insightsthat you could get from your
bank could be pretty powerfuland pretty helpful.
It could help you make betterspending decisions.
It could help you make betterdecisions with respect to
investing, even and those arethings that third-party apps
have been doing for a whileproducts like Mint or a number
of others Rocket Money isanother one that have some AI

(09:32):
features, but this is anopportunity for a platform like
a bank to bring some of thatengagement back to the bank as
the core platform for mostpeople's primary financial
interactions.
So I think that's interesting.
In my experience, theassociation community has been
moving a little bit slower thanI'd like in terms of member

(09:53):
service agents.
Overall, people have been doingbits and pieces.
We have, in our own family ofcompanies, a number of groups
that are working on things thatare in this space, one of which
is obviously Betty, which hasabout 100 associations working
now and growing quickly, andBetty is definitely in this
realm as the knowledge agent,the expert agent in terms of all
things association knowledge.

(10:14):
We've mentioned previously onthis podcast.
We're launching somethingspecifically for member service
that deals with routing incomingasynchronous messages like
emails and SMS and so forth, butyou know, I'd say that we're
still in super, super earlyinnings.
So if you're an associationthat's thinking about this,
saying, hey, we'd love to havesomething like the Wells Fargo

(10:35):
assistant or like the Klarnaassistant, you've still got
plenty of time ahead of you.
But I wouldn't, you know, spendall year thinking about it.
I'd run an experiment.
To me, this is.
What's so powerful about thisparticular use case is it's both
sides of the value equation.
The one side is cost reductionor efficiency improvements, but
the other side is improving thevalue to the customer, which is

(10:57):
the biggest thing.
When you see people using aservice more and more, that
should light up a light bulb foryou.
It says, hey, there's somethinggood here.
When we see, for example,engagement in a web-based search
tool compared to a web-basedknowledge agent, where the
knowledge agent has literally50x longer session times than a
search tool, that should tellyou something about the value

(11:18):
you're creating.
It's not that it takes 50 timeslonger to get the information.
It's quite the opposite, infact, the knowledge agent is
much, much faster at gettingpeople the information they want
.
But rather, because peoplefound value and it's low
friction, low time to value forthe customer, they come back
more.
So if they come back more,there's more opportunities to
engage more opportunities tocreate value and have a

(11:40):
reinforcement cycle.
So I find it to be a really,really exciting area for
associations to jump into, but,as I said, I think it's still
super early.

Speaker 3 (11:49):
Okay, I was going to say I'm sure we have some
listeners thinking, well great,wells Fargo did it and Klarna
with their 70 million and 150million customers respectively,
that's feasible for them.
You said it's still earlystages for associations.
Can you contextualize what youmean by that?
So what would you say iscurrently feasible right now for
a pilot project with a memberservice agent?

Speaker 1 (12:11):
I think you could stand up a member service agent
over the next three to sixmonths in your association a
number of different ways.
There are a number of tools youcould use for that, using
either off-the-shelf tools andjust string them together with
different kinds of agentframeworks.
You could certainly partnerwith companies that specialize
in this, either in theassociation market, like our

(12:33):
companies, or companies that areoutside of the association
market who do this kind of work.
There's companies that arefocused on kind of large
enterprise, like the one thatyou mentioned.
There's also a company calledDecagon, and Spiera is another
one, that do customer serviceagents kind of at the very high
end of the market, and people inthe association market, I think
, are gonna have associationspecific solutions more and more

(12:55):
.
Obviously what we're focused onis that, but you're gonna see
more and more choice there.
So I think there's off theshelf stuff you deploy and you
can also build something in thisspace.
I think this is a greatopportunity for an
experimentation round where youcould do something really,
really small.
Don't try to boil the ocean andsolve all customer service or
member service inquiries.

(13:15):
Focus on a pain point.
For example, many associationshave a highly seasonal volume of
activity that comes in aroundtheir annual conference.
So prior to the annualconference they might have a
fairly reasonable inflow ofinquiries, but right before and

(13:37):
during and after the conferencethey might have, let's say, a 30
or 60 day window of time on thecalendar where it's just
completely crazy.
Well, what if we could put inplace a great member slash event
service AI that could helpfield 50, 60, 70% of those
questions that are fairlyrepetitive?
That's a super achievable thing, and within the narrower
context of events, the domain ofquestions usually are far

(13:59):
narrower.
So I think that's an easy thingto go experiment with.
Overall, what I'd say is, to me,the thing that you have to
remember is yes, you're anassociation, you're not Wells
Fargo.
Yes, you're an association,you're not Amazon.
But the technologies have comedown so much in cost, they're so
much more accessible andthey're so much more powerful
that not only can you do this asan association, but you're

(14:22):
going to be expected to.
Your members don't care thatyou're not Wells Fargo or Amazon
or Netflix or Karnak.
They just expect the samequality of experience from you
that they expect from theirlargest consumer experiences.
And it may not be fair, butfairness doesn't really matter.
If the eye of the consumer, theexpectation and the bar has
been set at this level, they'regoing to expect it soon enough

(14:44):
from you, so might as well getahead of that and provide them
something slightly before theymight expect it from the
association.

Speaker 3 (14:50):
And then you can provide that additional value,
those insights, things thatreally, really help your members
in their profession or industryto further create that
value-based relationship.

Speaker 1 (15:00):
You know, I'd say this is also a great time to
reinforce a concept we've talkedabout on the pod Mallory a
number of times, which is how doyou prioritize your energy?
Your energy might be classifiedas human labor, like your
team's time, your volunteers'time, also your dollars.
That's part of the energy flow,right?
Where do you invest?
And a lot of people are saying,well, our infrastructure is so

(15:23):
terrible.
We've got ancient systems.
You know, we've got a reallyold MS and we've got to replace
that thing or an old LMS decentchance they'll still work this
year and next year.
And the question is instead ofreplacing a major system like

(15:46):
that, which is, you know,significant effort, sometimes
takes 18 to 24 months to fullygo through a process like that,
sometimes longer.
What if you didn't do that?
Right, and you said, hey, we'regoing to deprioritize some of
those classical association ITthings and instead invest a few
dollars and some time, timebeing the most important
ingredient to experiment withthis use case.

(16:07):
Right, go figure out how tomake a member services agent
work for you as your priority.
Let's say you did that for thenext six months and you hit the
pause button on a pending AMSselection or AMS implementation.
The amount of value you createfor members from this technology
is so much higher.
It's dramatically differentthan what an internal system
replacement might yield.
So again, I'm not suggestingthat you work with an unstable,

(16:32):
shaky foundation, with ancienttechnology forever.
But if you have to choosebetween something like this and
infrastructure improvements that, frankly, nobody's going to
really notice on the externalside, I'd focus on this and
maybe you don't have to choosebetween the two in your group
and your association, but mostpeople do have to choose between
those kinds of priorities.
So I decided it'd be a good timeto remind people that you can

(16:53):
attack these things if you'rewilling to say no to stuff.
You just have to draw a line inthe sand and say you know what.
We're going to put a pause onall these old, classical types
of systems and projects andwe're going to keep them running
, obviously, but we're not goingto invest big dollars and big
energy in these oldertechnologies.
Instead, we're going to focuson making these new AI things
work.
Last thing I'll say about thatis once you do these kinds of

(17:14):
new projects, you'll actuallyreframe what you think you need.
When it comes time to replacesome of that infrastructure, you
might think you know what youwant in that next generation AMS
, but frankly you probably don't.
When you build an AI technologyor two and deploy it into
production, you will get so muchbetter of a sense of where your
members want you to go, andthat might change the

(17:35):
requirements for what that newAMS is going to do.
And then the last thing relatedto that, by the way, is the AMS
vendors are also figuring thatout.
Whether you're talking about atraditional AMS vendor in the
space or some other type ofsolution, everybody in these
types of database applicationsis working really hard right now
to figure out how to AI enabletheir systems, so I'd give them

(17:56):
a little bit of time too.
I think you'll have betterchoice and you'll have better
visibility into what you'reactually going to get.

Speaker 3 (18:02):
Mm.
Hmm, hearing you talk aboutthat.
We've just hit the one yearmark of moving to Atlanta and it
made me think of our experiencelast year of moving into an
apartment we had never seen andtrying to furnish it before we
were there and realizingsometimes you just need to be
there physically in the spacebefore you realize, oh OK, we
need this size couch, we need aTV right here.

(18:23):
It makes me think of you or ofassociations specifically trying
to replace their AMS and thenperhaps getting to that point
and thinking, oh gosh, now withAI, we realize we need all these
other features and all thisother infrastructure.
So I think it's a really validpoint, amit, and I want to talk
a little bit about this pilotproject that you mentioned.
I can definitely resonate,having been the primary point

(18:45):
person at Sidecar who would takein a lot of inquiries
approaching digital.
Now the conference, before theevent and after the event.
However, I would think if youcame to me and said we're going
to, you know, roll out this AIagent and there's going to be
potentially no humanintervention, right, we're just
going to roll it out, I would beintimidated by that and, to be

(19:06):
honest, scared that it wouldn'twork.
So and I'm sure our associationlisteners feel the same way
what is your thought on thepilot project of trying to roll
out an agent that has no humanintervention versus trying to
roll out an agent that does therouting, like you kind of
briefly mentioned earlier?
Is no human intervention thegoal?
Talk me through that.

Speaker 1 (19:28):
I don't think that's the goal at all in almost all
cases.
I don't think that's the caseeither for Klarna or Wells Fargo
, as I understand their models.
It's more about makingavailable instant and
high-quality responses for mostthings but at the same time
being able to interact with ahuman agent when appropriate.
And this might sound like we'retrying to find a silver

(19:49):
lighting in terms of theemployment side of the equation
here in saying that the humanscan focus on higher value
activities.
That's oftentimes consultantsspeak for, saying they're going
to be laid off.
In reality there's some of thatthat might happen in the
association market, probably notso much, but in the broader
market, if you have 10,000people in a call center, maybe
you don't need 10,000, maybe youneed 2000,.

(20:10):
But you need your best 2000people.
So there's some issues therefor sure, when you think about
that across an entire sector.
But for the association world,I think of it this way that you
know your member services folks,your event services folks.
They have a lot more to offerthan just answering rote
inquiries, asking, like peoplesay hey, when do I need to

(20:32):
register?
Where can I check in?
Can I bring my spouse to thisparticular function?
What's the guest registrationfee.
You know where can I find thisparticular article, All these
kinds of basic help, deskquestions, and AI can nail all
those things and those peoplewho are asking those questions
are going to be happier with abetter answer.
That's nearly instant.
But those member services reps,those event folks can have
conversations with people, canlearn more from those members,

(20:56):
can take time to actually havelive synchronous phone calls and
video calls, to really be theconcierge, to help provide an
experience, so that it feelslike you're checking into the
four seasons when we come to theEuro event rather than checking
into the Red Roof Inn.
So you know, the whole idea isthat you want to level up the
caliber of service and thequality of service that you

(21:17):
provide, and you can do that.
You can, you know, punch wayabove your weight class by using
AI to take care of the rotestuff.
Coming back to your point,that's where this concept in
agentic systems, called human inthe loop, is so critical.
That's for key decision-making,but it's also for escalation,
where the AI should be trained,and can easily be trained, to be
smart enough to not try to takeon everything right, when you

(21:40):
can tell the AI hey, for thesethree or four different kinds of
inquiries.
We can answer it in thesedifferent ways.
These are the ways.
These are the tools that areavailable.
You might have knowledge agency.
You might have capabilitiesaround database lookups.
There might be two or threedifferent things that the agent
is really good at, but we cantell the agent to err on the
side of getting a human involved.

(22:00):
If there's any question as tothe value of the quality or
independent of the objectivepurpose of the call or the
inquiry, let's just say that theAI detects a tone of
frustration.
Let's say that there's two orthree iterations of emails and
the AI detects that the person'sjust not particularly happy.

(22:20):
You know, AI is really reallygood at reading into the emotion
from just plain text, andthat's even more true with audio
.
If you were to do this withaudio capabilities and then to
be able to detect that and say,hey, you know what I think?
Mallory is not super happy withme right now, it's the AI.
I'm going to forward thismessage to somebody else, for a

(22:42):
human in that case, right Tohelp Mallory out.

Speaker 3 (22:46):
Yep, as you said, ai is pretty good at detecting
sentiment.
It's not something you think itwould be good at, but, like
word choice and especially if ithas more information through
audio video, it does a prettydecent job at it.

Speaker 1 (22:59):
Well, you can also tell, like, how someone, if
someone's coming back two orthree times and they feel like
they're asking the same thingrepetitively and they they like,
or even like using a simplelike phrase, like as I said.
Right, when I say as I said, Ifeel like I'm repeating myself,
and I find myself doing thatwith customer service reps in
that, you know, kind of ongoing,infinite loop of emailing

(23:20):
people who really don't have agreat idea of what I'm after but
are there to kind of, you know,address my issue in some way.
So I think there's a lot ofopportunity here.
But yes to your point, mallory,you make a really important one
.
I wouldn't try to like justhand this over to the robot and
say good luck and hope to see inthe future, because I don't
think that's a complete solution.
I think you have to level upwhat the humans do in this

(23:42):
equation.

Speaker 3 (23:49):
My last question here , amit, is that the ability to
process sensitive data locallybefore any cloud interaction is
a major privacy advancement andis, of course, essential for
things like banking or buy now,pay later.
When you're dealing withpeople's payment information, do
you think that this is anecessity for associations?

Speaker 1 (24:05):
I think it's an important concept that
associations should be aware.
A lot of people make theassumption they'll say to me,
for example, oh, I love the ideaof whatever the application is,
but they'll say, I have allthis sensitive data, or it might
not be sensitive like patientdata or banking data or
something like that, but itmight be just.
We have a lot of content in ourprivate knowledge repository.

(24:26):
We don't want to send that tochat, gpt or to cloud.
We just don't trust them andthat's a reasonable concern.
But people make the assumptionthat that's a dead end.
Right, that that's the end ofthe conversation.
Whereas there's both ways ofdoing private deployment in the
cloud of your own models whereyou could say, hey, I'm going to
run Lama or a number of othermodels in a private cloud
deployment and to what youspecifically brought up, these

(24:50):
models are shrinking, theircapabilities are growing and
they're shrinking in size.
You can actually run themlocally on a phone, in a web
browser and in ways that alsoprovide additional privacy.
So I don't know exactly whatWells Fargo is doing, but
Apple's strategy around thissounds similar.
I don't know exactly what WellsFargo is doing, but Apple's
strategy around this soundssimilar.
What they'll do is on the phoneitself, the LLM that's running

(25:11):
locally a very, very small LLMwill try to get the essence of
what you've asked and thendetermine if it can answer the
question locally or if it willneed to promote a portion of
that information.
Abstracting out anythingpersonally you may have shared
with, just the general conceptget higher order knowledge from
a remote LLM, also operating ina secure manner, and then pull

(25:33):
that back to the local LLM tosynthesize a response that then
reintroduces your personalinformation.
But the personal informationnever really left the local
environment.
The most important thing forall of you associations to note
is that you have options.
You have ways of doing secure,private AI inference.
There's a number of ways to dothis and you can even do it
locally on-device, and that'sgoing to continue to be the case

(25:55):
.
There's all this growingcollective body of language
models that you can run that aresmaller and smaller, that run
extremely efficiently on desktopcomputers and laptops and even
on phones.

Speaker 3 (26:09):
Well, you really set me up perfectly there, amit.
To go to topic two, which isMicrosoft's 5.4 models small
language models with bigreasoning power.
So Microsoft just released anew family of 5.4 models,
including these great names 5.4Reasoning, 5.4 Reasoning Plus
and 5-4 Mini Reasoning.
Those aren't too bad.

(26:30):
I've seen worse, I would say,come out of OpenAI.
The 5-4 Reasoning models arevery much a part of the broader
trend toward reasoning orthinking models, called
either-or, that can performadvanced reasoning, an ability
to analyze complex scenarios,apply structured logic and solve
problems in a way that resemblehuman thinking.
So, to break down that 5-4family, we've got 5-4 reasoning,

(26:53):
which is a 14 billion parameteropen weight model, fine-tuned
for complex reasoning, math,science and coding tasks.
It uses supervised fine-tuningwith high quality, curated data
enabling it to generate detailedreasoning chains and match or
surpass much larger models onbenchmarks.
Then we've got 5.4 ReasoningPlus, which builds on that 5.4

(27:16):
Reasoning model that I justmentioned, further trained with
reinforcement learning and ableto use 1.5x more tokens for even
higher accuracy.
It matches or exceeds theperformance of much larger
models like DeepSeek R1, whichwe've covered on the podcast,
which, as a note, has sixhundred and seventy one billion
parameters compared to the 14billion parameters of this model

(27:38):
and OpenAI's O3 Mini on severalkey benchmarks.
And then we've got 5.4 MiniReasoning, a compact three
point8 billion parameter modeloptimized for mathematical
reasoning and educational use,suitable for deployment on
resource-limited devices likemobile phones and edge hardware.
So Amit was already kind ofgearing up to mention a lot of

(28:00):
the practical benefits ofsmaller models.
They can run locally on PCs,mobile devices and edge hardware
.
They're also designed foroffline use in co-pilot and PCs
and, of course, there are lowercomputational requirements that
make them more accessible andcost effective.
All three of these models areopenly available under
permissive licenses and they canbe accessed through Azure AI

(28:22):
Foundry and Hugging Face.
So, amit, what are your initialthoughts on the 5-4 family of
models?

Speaker 1 (28:31):
Well, first of all, let's spell this out for folks,
because 5-4 might be pronouncedor spelled differently.
It's P-H-I dash number four,and that's, I think, part of
what makes it so hard topronounce is there was five,
three, and then it's almost likeyou're saying five, but yeah,
five, four you're right but yeah, that's what I always think.

(28:53):
When I first started hearingthis but it's
p-h-i-p-h-i-dash-four mythoughts are, wow, this is
really exciting.
So what you said that's part ofmany really interesting
comments is that, not across allbenchmarks, but across several
important benchmarks.
5.4 Reasoning Plus, which I'lltalk about in more detail in a
second, matches or exceedsDeepSeek R1, which, if you

(29:16):
recall, r1 shook the world backin January February timeframe.
People freaked out because itwas as performant as OpenAI's
then most powerful AI reasoningmodel, the O1 and O3 mini
capabilities.
So here's the deal.
Here's the way I think aboutthis is that this is a tiny
model 14 billion parameters andtoday's model size is really

(29:40):
small, capable of being run,probably on some phones, but
definitely on a PC or a Mac onsome phones, but definitely on a
PC or a Mac.
And one of the ways they'reable to make it perform, as well
as by giving it more time tothink when you ask the question.
So, reasoning slash thinkingmodels it sounds like some new
category of model.
It's this really cool, complexthing.
In reality, actually, it's notall that different from the

(30:03):
models we've had in the past.
It's essentially saying hey,model, I want you to spend time
to think about this problem, tobe able to spend time just like
going deeper and thinkingthrough the problem, breaking it
down step by step into smallchunks and then compiling the
results of each of those substeps into an answer.
Another way to think about itis that the model is able to

(30:26):
revise something that it thoughtabout previously.
When we interviewed Ian Andrewsfrom Grok Grok with a Q, he
used an analogy that I love andI've repeated this a number of
times which is that it's likegiving the model a backspace key
where the model can edit itsprior response as opposed to
simply writing as fast as it can.
So that's what these reasoningand thinking models do, and the

(30:48):
way to think about it for you isthat you know you have access
in a 14 billion parameter modelis something that previously,
literally two months ago,required a 671 billion parameter
model, which makes it possibleto run all sorts of workloads on
smaller and smaller devices.
So, and then the mini model is amuch smaller model.
It's, you know, a quarter thesize of the main 5.4 model, but

(31:09):
it also is trained to use morecompute resources when you ask
questions, so it can reasonthrough problems, and that model
is suitable for running on edgehardware, which would include
phones and other devices thathave much smaller memory and
computational ability.
So I find all of this to besuper exciting.
It just reinforces the trendline of what we talked about.

(31:30):
I've been saying this for a fewyears now that I'm actually more
excited about the compact,small, super efficient,
lightweight models becomingsmarter than I am about frontier
models like Cloud37, sonnet,gemini, p5 Pro.
Those are awesome.
The fact that these superpowered models that run only in
the cloud are getting smarter is, of course, exciting, but the

(31:50):
fact that these small, reallyefficient models can do so much
more is just stunning.
I mean, what you have in Phifor Reasoning Plus is better
than what you had six months agoin the very best models in the
world, and you now can run thaton your computer for free.
That's a pretty stunningadvancement in a very short
number of months.

Speaker 3 (32:11):
Mm, hmm, I know you, you and I we like to geek out
about all the minute details ofall these models, because that's
part of our job and I think wejust enjoy learning about it.
But you mentioned the trendlines and I always think it's
important with these modelconversations to zoom out a
little bit and look at thebigger picture.
So what you just said is reallyprofound.
But what do you think thistrend line means with the

(32:32):
smaller, more powerful models,specifically for associations?

Speaker 1 (32:35):
think this trend line means with the smaller, more
powerful models, specificallyfor associations.
Well, I think you know.
Going back to the lastconversation, if there are
certain types of data that youhave in your organization that
you're not comfortable sharingwith any of the other any of the
model providers Anthropic orGoogle or OpenAI you can take
this FI model.
You can run it, even on yourown physical hardware if you

(32:56):
want to, or you can run it in avirtual private cloud
environment one of the majorcloud providers, where it's
completely contained and assecure as anything else any
other computer program that yourun.
Most people have gotten prettycomfortable about secure private
cloud deployment where, in acloud like Google or AWS or
Azure, you can set up resourcesthat are 100% secured and

(33:19):
private and, by most measures,far more secure than computers.
You run physically, like onyour own physical hardware and
run whatever programs you want.
Right, traditional computerprograms and an AI model is just
a computer program.
It works differently than atraditional computer program but
it is a computer program.
It works differently than atraditional computer program but
it is a computer program andyou can run it on hardware.
You have absolute control overright.

(33:40):
So if you have that ability.
It opens up a class ofapplications that associations
have often told me that they areuncomfortable with, which is
things related to clinical datathat they might've access to if
they're a healthcare association, if they're a financial
association association, maybethere's benchmarking data that
they receive from some of theirmembers that they don't feel
comfortable passing to open ai,or anybody else for that matter.

(34:02):
These kinds of applications nowcan be brought into a totally
secure environment and run withincredible accuracy.
So it opens up a ton of doors.
If you're worried about passingyour content to an AI system
because you worry that they'llsomehow subsume your content
into their corpus of trainingdata that they'll use for future

(34:25):
models which, by the way,little bit skeptical, even if
the legal agreement says that itcan't be used in certain ways,
you might say, okay, well, I'drather just be totally sure and
I'm just going to run this typeof model on my own.
So it opens up a lot of doors.
The other thing to think aboutis independent of the privacy

(34:48):
security conversation.
Smaller models run faster withless energy, less resources, and
are cheaper to run.
So if you have a little modellike this, that's as smart as
what was previously requiring agiant model and you can now run
it with a really cost-effectivesmall model.
You can do more, right?
You might have a hundredmillion documents that go back

(35:11):
to the beginning of yourassociation's formation and you
might like to analyze them inall sorts of new ways that you
previously would have thought tobe totally unattainable.
Because you might have saidwell, we have this idea in mind,
where you know, we have amillion documents of every paper
we've ever published and everyopinion that's ever been written
on every paper, and we wouldlike to ask certain questions of

(35:31):
every one of those papers.
Right, have a detailed analysisdone of each of those papers in
order to capture, like somemetadata or some structured
insight from all those papers.
And let's just say, a year agoyou thought about this idea.
It was a cool idea.
But then you're like, yeah, itwould cost like between two and
$3 to do that per paper and wehave a couple million pieces of
content.
That's just not going to scale.

(35:52):
But now if you have a 97, 98%cost reduction, which is
basically what you get here,that might cost you a few
thousand dollars, right?
Or maybe $10,000.
You might say you know what?
That's actually prettyreasonable.
And if you wait six more months, it might be basically zero.
So the cost curve compressionis really compelling, as well as
the privacy it opens up thedoor to just use way, way more

(36:13):
of this inference that we keeptalking about opens up the door
to just use way, way more ofthis inference that we keep
talking about.

Speaker 3 (36:21):
Amit, this is just not something I'm up to speed on
, so I'll ask, in case we havesome listeners as well that have
the same question.
But when you talked aboutrunning models privately in your
own cloud environment versusrunning them locally, is one as
secure as the other, or is onemore secure than the other?

Speaker 1 (36:33):
You know there's pros and cons to each approach.
So let's say that I have theold school way of doing it, that
in my office I have a computerserver and I run that physical
server.
I am responsible for sitesecurity to make sure no one
physically enters that location.
I'm responsible for networksecurity.
I'm responsible for the wholething.
Right, and so traditionally ITdepartments and associations did

(36:56):
that.
They would run, you know,they'd have server rooms where
they'd have you know, racks ofthese servers and they'd run
them and they were responsiblefor all of that.
You know the site security andthe digital security and I would
argue that, generally speaking,that is going to be less secure
than a modern cloud providerthat have rigorous, tight,
military-grade physical securityaround their sites way, way

(37:19):
more than any association isever going to have.
And from a digital securityperspective, implementing your
own approach to cyber securityis really important for your own
resources.
But cloud service providerstend to have really really good
built-in security architecturesthat are a good starting point.
So I generally am a skeptic ofanyone who tells me that they

(37:40):
can run a more secure localenvironment than a
well-implemented cloudenvironment.
I think most security expertswould tend to agree with that.
Certainly for smbs, like smallto medium-sized businesses,
which associations fit into,there are exceptions to every
statement.
Obviously, there are someorganizations who would argue
you know, we have even strongersite security and visual
security than any cloud provider.

(38:01):
Certain information we havejustifies this, and sure,
there's always exceptions, butfor the vast, vast majority of
our listeners in this market,cloud-based deployment is going
to work really, really well.
It just has to be well thoughtout.
You could create a cloud-basedresource that you leave a wide
open backdoor to withoutthinking about where you're just
like oh, I'm just going to posta password to my website on

(38:25):
Reddit and let anyone log in.
I mean, that sounds totallystupid, but the reality is
there's all sorts of humanfactors that go into
compromising security all thetime.
That can affect you either way.
Local inference on a device thatan end user uses, though, is
actually really interesting as acomplement to that, because,
let's say again in the WellsFargo case, I'm talking to my

(38:46):
banking assistant on my phoneand I have to get all this
information about and I'mtalking about, like, my salary
or my investment strategy and mynet worth and all this other
information.
Maybe that information isn'treally what that local AI about,
like my salary or my investmentstrategy and my net worth and
also their information.
Maybe that information isn'treally what that local ai needs
help with.
Maybe it needs to help.
It needs to help reasoningthrough some general ideas.
So, then, guide what the localllm does.
So, instead of sharing mysalary and my net worth from the

(39:09):
local conversation, let's saywith the remote ai, what it does
is, hey, I'm with a consumer,they're working through these
kinds of problems, Can you giveme some general guidance on A, B
and C?
And then the remote LLM throwsway more compute at it and comes
up with a stronger answer,feeds it back to the local LLM
and says, hey, here's thedirection you should go with.
And then the local LLM thentakes that private data, infuses

(39:38):
it back into the answer fromthe remote LLM and gives me an
experience that's really reallyhigh quality in my phone right
and my personal data never leftmy phone.
And the same thing can be donefor healthcare, and that could
be a compliment thatassociations can take advantage
of.
Let's say that you're a medicalassociation and you want to
provide capabilities for yourmembers to have chats with you
that are specific, down to thecase level on a particular

(39:59):
patient they're working with.
You probably don't want any ofthat.
You know healthcare data toever come back to you, right?
So what if you have local LLMthat did part of the processing
and, just like I said,abstracted out the problem
removing patient-specific data,then got a knowledge agent that
has a tremendous amount ofcontent and compute capability
to formulate an answer and thenre-infuse that back with the

(40:22):
local data?
There's ways to do that as well, and there's applications for
associations for sure.

Speaker 3 (40:28):
Yep, I've mentioned my husband's in healthcare and
he is just waiting for the daythat he has exactly what you
mentioned, where he could dropin some patient info and get
that resolved with betteraccuracy perhaps than he could
have found doing some searchesonline.

Speaker 1 (40:41):
I was just going to say one other thing, mallory,
that I think our listeners mightfind interesting.
For those of you that haveheard me talk about the Acquired
podcast before I heard Mallorymention it, we're big fans of
the work that those guys do.
That's a long form likebusiness history style podcast.
There's a new episode they justdropped as of the recording of
this podcast in spring of 25.
They dropped an episode on Epic.

(41:04):
Epic is a software company inthe healthcare space and what's
super interesting?
There's a lot of interestingthings about that particular
episode, but there's a lot oftalk about AI and a company like
Epic and what they're going todo in the healthcare field, and
they're by far the dominantplayer in providing tools like
MyChart that patients use andthe EMRs, ehrs and billing

(41:25):
systems and so forth thathospitals and health systems use
, and certainly those kinds oftools will likely soon feature
AI capabilities for doctors touse.
So you might ask the questionwell, what is the role of the
association?
How do we provide value whenthe hospital might have an AI
bot built into their secure EHREMR system?
And the answer in my mind is tocomplement that where you have

(41:49):
certain things that nobody elsehas, particularly your content,
and over time you can captureother forms of experiential data
that would be unique to you,that could be complementary to
what people get out of an EHREMR.
So I think there's actually avery bright story.
Whether those things caninteroperate and integrate with
the experience that a doctor ora medical practitioner member

(42:10):
may have is a big question,because companies like Epic,
specifically, are famously veryguarded about integrations.
But I think there's anopportunity here for many
associations in a similarcapacity outside of healthcare
as well, to do things where youcan complement a line of
business system that yourmembers use every day.

Speaker 3 (42:29):
I'm going to have to tell Bailey about that episode.
I've actually gotten him on theAcquired podcast.
He listened to the Costcoepisode as well and really
enjoyed it, so he will certainlyenjoy the Epic one, amit.
My last question was about thistrend line.
Again, it seems like when youzoom out, we're seeing smaller
and smaller models become moreand more powerful In your mind.

(42:49):
In the next five years, if youcould zoom out, do you think
we'll be looking at tons ofmodels that are, you know,
millions of parameters and likemore powerful than we could
possibly imagine, I guess.
Are we trending toward creatingas small and small models as we
can, or is there a place forthe giant ones and the small
ones as well?

Speaker 1 (43:10):
I think it's both.
I think that you know if youcan further compact these models
down to the point where, let'ssay, we could come up with 12
months from now.
You said five years.
I don't know that I can thinkthat far out.
I think that's next year.
You're right, that was a hardquestion say that an equivalent

(43:38):
model to the 5.4 reasoning modelis available in 100 million
parameter or 200 millionparameter model.
Right, like you know, 10x oreven 50 or 100x smaller than the
current 5.4 model.
That could run in a web browser, that could run on really,
really lightweight phones noteven like an iPhone 16, but
something much smaller than that.
And so if that's the case, thennow you have really high end
reasoning capability in a supercompact form.

(43:58):
You know you could have itrunning pretty much everywhere.
You can have that capability inyour earbuds, you know.
So those capabilities becomingsmaller and smaller is good.
I think what you're also goingto see is that the state of the
art frontier models will keepgetting smarter and smarter.
You know, one of the stats thatI think has been missed by a lot
of folks is I think it was themost recent O3 release by OpenAI

(44:20):
, and Gemini 2.5 Pro and Cloud3.7 in extended thinking motor
are kind of similar in terms ofwhere they're at, but this
benchmark shows that O3 isapproximately on par with about
the 80th percentile ofperformance of PhDs across all
disciplines.
So let's unpack that for just asecond 70, 80 percentile of PhDs

(44:41):
.
That means that if you put theaverage PhD, which is nose-louch
, typically right in the middle,that's the 50 percentile.
So O3 is at the 70 to 80percentile of performance of
PhDs, and not just in one fieldbut across a number of different
disciplines, ranging fromhistory to philosophy to various
forms of science andengineering.
So it's pretty stunning whatyou have, and that's an O3,

(45:03):
which is a big, heavy, expensivereasoning model.
But if you can have thatcapability distilled down into
smaller and smaller and smallermodels, even if these models
didn't get any smarter right,that's pretty darn smart.
And if you make it super, superfast, small, cost-effective,
energy efficient, the doors thatopen up are really compelling.

Speaker 3 (45:25):
That is a great place to wrap up this episode.
What would you do if you hadall those PhDs at your
fingertips running on your phoneand your earbuds?
I don't know, we might be therepretty soon.
Everybody, thank you for tuningin to today's episode and we'll
see you all next week.

Speaker 2 (45:43):
Thanks for tuning in to Sidecar Sync this week.
Looking to dive deeper?
Download your free copy of ournew book Ascend Unlocking the
Power of AI for Associations atascendbookorg.
It's packed with insights topower your association's journey
with AI.
And remember Sidecar is herewith more resources, from
webinars to boot camps, to helpyou stay ahead in the

(46:05):
association world.
We'll catch you in the nextepisode.
Until then, keep learning, keepgrowing and keep disrupting.
Advertise With Us

Popular Podcasts

Amy Robach & T.J. Holmes present: Aubrey O’Day, Covering the Diddy Trial

Amy Robach & T.J. Holmes present: Aubrey O’Day, Covering the Diddy Trial

Introducing… Aubrey O’Day Diddy’s former protege, television personality, platinum selling music artist, Danity Kane alum Aubrey O’Day joins veteran journalists Amy Robach and TJ Holmes to provide a unique perspective on the trial that has captivated the attention of the nation. Join them throughout the trial as they discuss, debate, and dissect every detail, every aspect of the proceedings. Aubrey will offer her opinions and expertise, as only she is qualified to do given her first-hand knowledge. From her days on Making the Band, as she emerged as the breakout star, the truth of the situation would be the opposite of the glitz and glamour. Listen throughout every minute of the trial, for this exclusive coverage. Amy Robach and TJ Holmes present Aubrey O’Day, Covering the Diddy Trial, an iHeartRadio podcast.

Betrayal: Season 4

Betrayal: Season 4

Karoline Borega married a man of honor – a respected Colorado Springs Police officer. She knew there would be sacrifices to accommodate her husband’s career. But she had no idea that he was using his badge to fool everyone. This season, we expose a man who swore two sacred oaths—one to his badge, one to his bride—and broke them both. We follow Karoline as she questions everything she thought she knew about her partner of over 20 years. And make sure to check out Seasons 1-3 of Betrayal, along with Betrayal Weekly Season 1.

Crime Junkie

Crime Junkie

Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you won’t hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, you’ve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by audiochuck Media Company.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.