There is joernio's ghidra2cpg, not sure why they now seem to be pushing a forked set of patches https://github.com/joernio/ghidra, probably the DB format changes too rapidly or some other "we automatically intake unknown relationships lost statically".  That might get part of what you're looking for, even though, it isn't an exact fit, bringing in some higher level tooling, like all the graphql UI's that contextualize queries with type context are so helpful, whenever I don't have context aware syntax support, thar barrier to actually do anything limit's my enthusiasm so that only the most impactful (perceived before getting too far) get my attention (and I'm often wrong so :).  I forget if joern still uses Neo4j, I am confident that it's the best FOSS available for describing code/binaries right now. 

Getting more tools in this space is a great initiative that deserves attention.  Being able to communicate so expressively, codifying knowledge for bugs some helpers around supporting guided generation of queries for arbitrary conditions, the benefits for invariant analysis (as can been seen with Semmle/CodeQL) are extreme. 

On Mon, Jun 26, 2023 at 3:46 PM Dave Aitel via Dailydave <dailydave@lists.aitelfoundation.org> wrote:
There's a new Ghidra release last week! Lots of improvements to the debugger, which is awesome. But this brings up some thoughts that have been triggering my vulnerability-and-exploitation-specific OCD for some time now.

Behind every good RE tool is a crappy crappy database. Implicitly we, as a community, understand there is no good reason that every reverse engineering project needs to implement a key-value store, or a B-Tree, or partner with a colony of bees which maintain tool state by various wiggly dances. But yet each and every tool has a developer with decades of reverse engineering experience on rare embedded platforms either building custom indexes in a pale imitation of a real DB structure or engaging in insect-based diplomacy efforts.

I think the Ghidra team (and Binja/IDA teams!) are geniuses, but they are probably NOT geniuses at building database engines. And reading through the issues with ANY reverse engineering product you find that performance even for the base feature-set is a difficult ask.

My plea is this: We need to port Ghidra to Neo4j as soon as possible. Having a real Graph DB store underneath Ghidra solves the scalability issues. I understand the difficulty here is: There are few engineers who understand both Neo4j and reverse engineering to the point where this can be done. I mean, why do it in Neo4j and not PostGres? An argument can be made for both, in the sense that PostGres is truly Free and the most solid DB on the market. The pluses for Neo4j are that RE data is typically graph-based more than linear. 

I spent the last two years learning graph dbs, out of some masochistic desire and ended up getting certified - and I can still RE a little bit. I will manage the team porting Ghidra to Neo4j if someone funds it. :)

Either way, sooner is better than later. There are so many companies and people relying on these tools that it seems silly to do anything else. 

-dave
P.S. Yes, I remember BinNavi used MsSQL installs for its data, and this was annoying to install but ... I get why Halvar did it at the time. It's because he had real work to do and building a DB was not it. I can only assume Reven doesn't use their own DB? I mean the benefits for interoperability would be huge between tools. . . like literally everything you want to do with these tools is better with a real DB underneath. 
  

_______________________________________________
Dailydave mailing list -- dailydave@lists.aitelfoundation.org
To unsubscribe send an email to dailydave-leave@lists.aitelfoundation.org