Now that I’m already using maildirs again, I could fulfill the lacking feature from #aerc (compared to #mutt) using #notmuch, which de-duplicates my emails:
#!/usr/bin/env bash
#
# Copyright (c) Jarkko Sakkinen 2024
# JSON queries ripped from https://github.com/esovetkin/notmuch-deduplicate
set -e
QUERY='*'
notmuch show \
--format=json \
--entire-thread=false \
--body=false \
"${QUERY}" | \
jq \
-n \
--stream 'fromstream(1 | truncate_stream(inputs))' | \
jq -r '.. | .filename? //empty | @tsv' | \
grep '\t' | \
awk -F'\t' '{for (i=2; i<=NF; i++) print $i}' | \
xargs -I{} rm -v "{}"
notmuch new
# vim: filetype=vim ts=2 sw=2 et
@jarkko There are much easier ways to do this in notmuch than that monstrosity!
@jarkko See 'notmuch search --output=files --duplicate=N' option.
Something like this will print all the dupes, assuming no file has more than five dupes:
for dupe in $(seq 5 -1 2); do notmuch search --output=files --format=text0 --duplicate=$dupe \* | xargs -0 -I{} echo "{}"; done
@jarkko Or, if you do this regularly followed by 'notmuch new', just removing the second dupe will eventually boil down to no dupes:
notmuch search --output=files --format=text0 --duplicate=2 \* | xargs -0 -I{} echo "{}"
All that said, I personally keep all duplicates, because they may have arrived via different mailing lists or directly, and may have subtly different content (e.g. mailman filters Cc's). All dupes have in common is the message-id, really.
@bremner @jani I’m wondering here, did ~=
in mutt do full body compare…
I’m also thinking that as aerc has this:
:query [-a <account>] [-n name] [-f] <notmuch query>
Create a virtual folder using the specified top-level notmuch query.
This command is ex clusive to the notmuch backend.
[man aerc]
Maybe one possibility would be feasible to implement directly into aerc :duplicate
which would similarly create a virtual folder for duplicates, which would then allow interactively decide the faith (as it is for filter and query). This could potentially then do full body compare as it was within the implementation in Go.
https://git.notmuchmail.org/git?p=notmuch;a=blob;f=notmuch-search.c;h=327e144564de48e0b339036528505d5a227bc40a;hb=HEAD#l579 what we do is pretty simple minded, scan the list of file names for a given message-id.
OK, so aerc has {{.MessageId}}
, which can be e.g. to :term b4 am {{.MessageId}}
. In any command parser substitutes that with the message ID of currently selected message. It also has notmuch support.
So I should like a make notmuch query (which aerc should support) id:{{.MessageId}}
and see if that gives me the set of messages with the same message ID.