Dailydave August 2023

dailydave@lists.aitelfoundation.org

10 participants
4 discussions

DARPA AI Challenge!
by Dave Aitel 29 Aug '23

29 Aug '23

I've been working with LLMs for a bit, and also looking at the DARPA Cyber AI Challenge <https://www.darpa.mil/news-events/2023-08-09>. And to that end I put together CORVIDCALL which uses various LLMs to essentially 100% find-and-patch any bug example I can throw at it from the various GitHub repos that store these things (see below). [image: image.png] So I learned a lot of things doing this, and one article that came out recently (https://www.semianalysis.com/p/google-gemini-eats-the-world-gemini) talked about the future of LLMs, and if you're doing this challenge you really are building for future LLMs and not the ones available right now. One thing they pointed out in that article (which I highly recommend reading) is that huggingface is basically doing a disservice with their leaderboard - but the truth is more complicated. It's nice to know which models do better than other models, but the comparison between them is not a simple number any more than the comparison between people is a simple number. There's no useful IQ score for models or for people. For example, one of the hardest things to measure is how well a model can handle interleaved and recursive problems. If you have an SQL Query inside your Python code being sent to a server, does it notice errors in that code or do they fly under the radar as "just a string". Can the LLM handle optimization problems, indicating it understands performance implications of a system? Can the LLM handle LARGER problems. People are obsessed with context window sizes but what you find is a huge degradation of accuracy in following instructions when you hit even 1/8th the context window size for any of the leading models. This means you have to know how to compress up your tasks to fit basically into a teacup. And for smaller models, this degradation is even more severe. People in the graph database world are obsessed with getting "Knowledge graphs" out of unstructured data + a graph database. I think "Knowledge graphs" are pretty useless, but what is not useless is connecting unstructured data by topic in your graph database, and using that to make larger community detection-based decisions. And the easiest way to do this is to pass your data into an LLM and ask it to generate the topics for you, typically in the form of a Twitter hashtag. Code is unstructured data. If you want to measure your LLM you can do some fun things. Asking a good LLM for 5 twitter hashtags in comma separated value format will work MOST of the time. But the smaller and worse the LLM, the more likely it is to go off the rails and fail to do it when faced with larger data, or more complicated data, or data in a different language which it first has to translate. To be fair, most of them will fail to do the right number of hashtags. You can try this yourself on various models which otherwise are at the top of a leaderboard, within "striking distance" on the benchmarks against Bard, Claude, or GPT-4. (#theyarenowhereclose, #lol) Obviously the more neurons you have making sure you don't say naughty things, the worse you are at doing anything useful, and you can see that in the difference between StableBeluga and LLAMA2-chat, for example, with these simple manual evaluations. And this matters a lot when you need your LLM to output structured data <https://twitter.com/RLanceMartin/status/1696231512029777995?s=20> based on your input. So we can divide up the problem of automating finding and patching bugs in source code in a lot of ways, but one way is to notice the process real auditors take, and just replicate this by passing in data flow diagrams and other various summaries into the models. Right now hundreds of academics are "inventing" new ways to use LLMs. For example "Reason and Act <https://blog.research.google/2022/11/react-synergizing-reasoning-and-acting…>". I've never seen so much hilarity as people put obvious computing patterns into papers and try to invent some terminology to hang their career on. And of course when it comes to a real codebase, say, libjpeg, or a real web app, following the data through a system is important. Understanding code flaws is important. But also building test triggers and doing debugging is important to test your assumptions. And coalescing this information in, for example, the big graph database that is your head is how you make it all pay off. But what you want with bug finding is not to mechanistically re-invent source-sink static analysis with LLMs. You want intuition. You want flashes of insight. It's a hard and fun problem at the bigger end of the scale. We may have to give our bug finding systems the machine equivalent of serotonin. :) [image: image.png] -dave

1 0

Book Review: Fuzzing Against the Machine (Antonio Nappa, Eduardo Blazquez)
by Dave Aitel 23 Aug '23

23 Aug '23

https://www.packtpub.com/product/fuzzing-against-the-machine/9781804614976 The authors claim in their conclusion: "We want to stress the importance of books as journeys to explore and experience topics from the unique viewpoint of the authors." And in this they succeeded. This book works best as a proposed curriculum for a five day workshop for experts to reproduce fuzzing frameworks that target embedded platforms - including Android and iOS. Largely this is done by figuring out how to get various emulation frameworks (QEMU in particular) to carry the weight of virtualizing a platform and getting snapshots out of it and pushing data into it. Fuzzing is a childishly easy concept that is composed of devilishly hard problems in practice (7 and 8 being the ones this book covers in depth - the fuzzers themselves are simplistic other than those topics): 1. Managing scale 2. Getting decent per-iteration performance 3. Triaging crashes 4. Building useful harnesses 5. Knowing when you have fuzzed enough, vs. being in a local minima 6. Figuring out root causes 7. *Getting your fuzzer to properly instrument your target so you can have coverage-guided fuzzing* 8. *Handling weird architectures* 9. Generating useful starting points for your fuzzer (or input grammars) All of these things are basically impossible in the real world. Your typical experience with a new fuzzing framework is that you install it on a fresh Linux, pick a target, and then watch as it fails to instrument or even run. In other words, just knowing which fuzzer versions to use, and on what, is valuable information. When I read a book on security, a good one, I want it to feel like I'm putting on a brand new powersuit, ready to march into the wilderness with a flamethrower and a mindset of extreme violence. This book delivers that feeling. Because while my current business practices have nothing to do with fuzzing the Shannon baseband, that doesn't mean some small part of me doesn't want to. We all have the dark urge. We crave SIGSEGV in things people rely on. So in summary: 10/10, great book. Would recommend buying 10, setting up a class, and going over it all together. Of course, this field is RAPIDLY EVOLVING and you're going to want to get it updated, perhaps with the fancy new PCODE fuzzer Airbus released earlier today. ( https://github.com/airbus-cyber/ghidralligator) -dave

1 0

BlackHat and Defcon 2023
by Dave Aitel 16 Aug '23

16 Aug '23

The Vegas security conferences used to feel like diving into a river. While yes, you networked and made deals and talked about exploits, you also felt for currents and tried to get a prediction of what the future held. A lot of this was what the talks were about. But you went to booths to see what was selling, or what people thought was selling, at least. But it doesn't matter anymore what the talks are about. The talks are about everything. There's a million of them and they cover every possible topic under the sun. And the big corpo booths are all the same. People want to sell you XDR, and what that means for them is a per-seat or per-IP charge. When there's no differentiation in billing, there's no differentiation in product. That doesn't mean there aren't a million smaller start-ups with tiny cubicles in the booth-space, like pebbles on a beach. Hunting through them is like searching for shells - for every Thinkst Canary there's a hundred newly AI-enabled compliance engines. DefCon and Blackhat in some ways used to be more international as well - but a lot of the more interesting speakers can't get visas anymore or aren't allowed to talk publicly by their home countries. If you've been in this business for a while, you have a dreadful fear of being in your own bubble. To not swim forward is to suffocate. This is what drove you to sit in the front row of as many talks as possible at these two huge conferences, hung over, dehydrated, confused by foreign terminology in a difficult accent. But now you can't dive in to make forward progress. Vegas is even more of a forbidding dystopia, overloaded with crowds so heavy it can no longer feed them or even provide a contiguous space for the ameba-like host to gather. Talks echo and muddle in cavernous rooms with the general acoustics of a high school gymnasium. You are left with snapshots and fragmented memories instead of a whole picture. For me, one such moment was a Senate Staffer, full of enthusiasm, crowing about how smart the other people working on policy and walking the halls of Congress were - experts and geniuses at healthcare, for example! But if our cyber security policy matches our success at a health system we are doomed. I brought my kids this year and it helps to be able to see through the chaos with new eyes. What's "cool" I asked? in the most boomery way possible. Because I know Jailbreaking an AI to say bad things is not it, even though it had all the political spotlights in the world focused on examining the "issue". The more crowded the field gets, the less immersion you have. Instead of diving in you are holding your palm against the surface of the water, hoping to sense the primordial tube worms at the sea vents feeding on raw data leagues below you. "Take me to the beginning, again" you say to them, through whatever connection you can muster. -dave

10 10

More Neo4j
by Dave Aitel 03 Aug '23

03 Aug '23

https://www.youtube.com/live/YY-ugAHPu4M?feature=share&t=1057 I have on my todo list to reply to our thread last month, but in the meantime, here is a video that goes over all the lessons learned from my last couple years doing Neo4j. But as a reminder, we should still port Ghidra to Neo4j. :) -dave

1 0

2026

2025

2024

2023

2022

2021

2020

Dailydave August 2023