@gregkh @mxk that sounds like a problem that https://mergiraf.org/ must have already solved, maybe you can steal their code to do something similar? I don't think they have an API for that, though.
@gregkh @neverpanic @mxk ping @pintoch ? I thought of saying something to that but maybe I'm not the best person for it ^^
The git diff pager 'delta' shows changed methods above diffing lines, also when they are not part of the diff. I suggest checking it out and integrating it into git.
Does it also only use the diffstat 'name' or does it do its own resolution?
https://dandavison.github.io/delta/introduction.html
via @oneiros
@neverpanic @gregkh @mxk The question is whether this suffices, as you also mentioned defines, which can cause a local change to change codegen of all usage sites.
The heavyweight way to do this is probably what cHash is doing (no matter at what representation level, so either at treesitter or LLVM-IR): paper: https://www.usenix.org/system/files/conference/atc17/atc17-dietrich.pdf code: https://github.com/luhsra/chash
So hash the AST of all toplevel entities in the translation unit before and after, and then make a diff of the hashes...
@neverpanic @gregkh @mxk concerning cHash, @stettberger is probably the person to talk to.
Otherwise, there is also difftastic in the "related tools" category: https://difftastic.wilfred.me.uk/
Basically diff on treesitter output; I'm not sure they offer the requested functionality as is, but should hold all basic ingredients by necessity.
@n0toose @gregkh @neverpanic @mxk what mergiraf uses is the GumTree algorithm, described quite succinctly here: https://mergiraf.org/architecture.html#matching and more in detail in https://hal.science/hal-01054552
This matching may or may not be the notion of diff that is useful for you…
@noctux @neverpanic @gregkh @mxk at this point, we also have IRhash, which is almost as performant (in terms of saving) in reusing build artifact while being more precise (less false misses).
@gregkh maybe experiment with the different ways of generating diffs?
The git diff sub-command has an option called algorithm, which defines how the diff is computed. If I remember correctly Myers is the default and its not that great. Histogram gave me the best results so far.
@gregkh it doesn't provide an exact list of symbols modified, but it generates a better diff, which allows to more easily see what has been touched.
So you are correct, it doesn't solve you problem exactly, but I think it might help anyway.