People doing software security often use LLMs more as orchestrators than anything else. But there's so many more complicated ways to use them in our space coming down the pipe. Obviously the next evolution of SBOMs https://www.cisa.gov/resources-tools/resources/cisa-sbom-rama is that they represent not just what is contained in the code as some static tree of library dependencies, but also what that code does in a summary fashion that you can check once you get the final binaries. In a certain sense, you can think of this as a behavioral attestation between the software publisher and the consumer who is actually running the product.
In other words, if my product is meant to connect to WWW.SPYWARE.RU, then it should say so in the SBOM behavioral manifest. But of course in practice these things get quite complicated, and hence you need to summarize semi-structured data (aka, the behavioral manifest is rarely exact), and then compare it to what is seen when the software itself is run (which if you've ever run strace ...is voluminous). That smells like a job for an LLM, or at the very least, a vector comparison. Likewise, automatically building harnesses to run and capture security sensitive information (or performance information as we learned from XZ), is rapidly also becoming a job https://google.github.io/oss-fuzz/research/llms/target_generation/ for an LLM.
I perhaps am channeling everyone else's https://www.cisa.gov/speaker/allan-friedman worry that too much of the SBOM community is arguing about which XML fields belong in a VEX addendum, rather than pushing the concepts forwards to actually solve problems. Or perhaps not! At some level, the software vendors are getting dragged through this process by their hair, which is very fun to watch.
-dave
Well this is rather timely! Although I'm not sure using an LLM for the behavioral aspect is entirely necessary. I've been working on an experimental system that does just what you talk about for dependencies ( https://docs.gitlab.com/ee/user/application_security/dependency_scanning/exp..., pre-alpha!). My solution uses static analysis because I'm a fan of determinism.
Snark aside, looking at behaviors of what our dependencies are doing is definitely another signal we should be using when we determine whether we want to add a dependency or whether something fishy is going on. I have lots of ideas on where to take this but I actually never thought about adding it to an SBOM. An interesting idea for sure.
-Isaac
On Thu, Sep 12, 2024 at 11:21 AM Dave Aitel via Dailydave < dailydave@lists.aitelfoundation.org> wrote:
People doing software security often use LLMs more as orchestrators than anything else. But there's so many more complicated ways to use them in our space coming down the pipe. Obviously the next evolution of SBOMs https://www.cisa.gov/resources-tools/resources/cisa-sbom-rama is that they represent not just what is contained in the code as some static tree of library dependencies, but also what that code does in a summary fashion that you can check once you get the final binaries. In a certain sense, you can think of this as a behavioral attestation between the software publisher and the consumer who is actually running the product.
In other words, if my product is meant to connect to WWW.SPYWARE.RU, then it should say so in the SBOM behavioral manifest. But of course in practice these things get quite complicated, and hence you need to summarize semi-structured data (aka, the behavioral manifest is rarely exact), and then compare it to what is seen when the software itself is run (which if you've ever run strace ...is voluminous). That smells like a job for an LLM, or at the very least, a vector comparison. Likewise, automatically building harnesses to run and capture security sensitive information (or performance information as we learned from XZ), is rapidly also becoming a job https://google.github.io/oss-fuzz/research/llms/target_generation/ for an LLM.
I perhaps am channeling everyone else's https://www.cisa.gov/speaker/allan-friedman worry that too much of the SBOM community is arguing about which XML fields belong in a VEX addendum, rather than pushing the concepts forwards to actually solve problems. Or perhaps not! At some level, the software vendors are getting dragged through this process by their hair, which is very fun to watch.
-dave
Dailydave mailing list -- dailydave@lists.aitelfoundation.org To unsubscribe send an email to dailydave-leave@lists.aitelfoundation.org
We've been talking about and giving "Beyond the SBOM" presentations for a while now, but to your point, I don't see anyone actually doing it.
If Solarwinds said "here's a script that will lock down your host firewall to just the outbound access our tools need to update themselves", that would be amazing, and would have saved everyone some time and trouble a few years ago.
[image: image.png] And Biden's EO from 2021 called out the need for software behavior transparency as well: https://techcrunch.com/2021/06/21/bidens-executive-order-on-cybersecurity-sh...
[crickets]?
--Adrian
On Thu, Sep 12, 2024 at 2:16 AM Dave Aitel via Dailydave < dailydave@lists.aitelfoundation.org> wrote:
People doing software security often use LLMs more as orchestrators than anything else. But there's so many more complicated ways to use them in our space coming down the pipe. Obviously the next evolution of SBOMs https://www.cisa.gov/resources-tools/resources/cisa-sbom-rama is that they represent not just what is contained in the code as some static tree of library dependencies, but also what that code does in a summary fashion that you can check once you get the final binaries. In a certain sense, you can think of this as a behavioral attestation between the software publisher and the consumer who is actually running the product.
In other words, if my product is meant to connect to WWW.SPYWARE.RU, then it should say so in the SBOM behavioral manifest. But of course in practice these things get quite complicated, and hence you need to summarize semi-structured data (aka, the behavioral manifest is rarely exact), and then compare it to what is seen when the software itself is run (which if you've ever run strace ...is voluminous). That smells like a job for an LLM, or at the very least, a vector comparison. Likewise, automatically building harnesses to run and capture security sensitive information (or performance information as we learned from XZ), is rapidly also becoming a job https://google.github.io/oss-fuzz/research/llms/target_generation/ for an LLM.
I perhaps am channeling everyone else's https://www.cisa.gov/speaker/allan-friedman worry that too much of the SBOM community is arguing about which XML fields belong in a VEX addendum, rather than pushing the concepts forwards to actually solve problems. Or perhaps not! At some level, the software vendors are getting dragged through this process by their hair, which is very fun to watch.
-dave
Dailydave mailing list -- dailydave@lists.aitelfoundation.org To unsubscribe send an email to dailydave-leave@lists.aitelfoundation.org
dailydave@lists.aitelfoundation.org