Conversation

Jarkko Sakkinen

Now that I’m already using maildirs again, I could fulfill the lacking feature from #aerc (compared to #mutt) using #notmuch, which de-duplicates my emails:

#!/usr/bin/env bash
#
# Copyright (c) Jarkko Sakkinen 2024
# JSON queries ripped from https://github.com/esovetkin/notmuch-deduplicate

set -e

QUERY='*'

notmuch show \
  --format=json \
  --entire-thread=false \
  --body=false \
  "${QUERY}" | \
  jq \
    -n \
    --stream 'fromstream(1 | truncate_stream(inputs))' | \
  jq -r '.. | .filename? //empty | @tsv'  | \
  grep '\t' | \
  awk -F'\t' '{for (i=2; i<=NF; i++) print $i}' | \
  xargs -I{} rm -v "{}"
notmuch new

# vim: filetype=vim ts=2 sw=2 et

#email

1
0
0

@jarkko There are much easier ways to do this in notmuch than that monstrosity!

1
0
1
@jani How? I just started to use it.
1
0
0

@jani And it does work ;-) That is always a a good state to begin with. I actually wanted to learn notmuch only because aerc does not have ~= query of mutt.

2
0
0

@jarkko See 'notmuch search --output=files --duplicate=N' option.

Something like this will print all the dupes, assuming no file has more than five dupes:

for dupe in $(seq 5 -1 2); do notmuch search --output=files --format=text0 --duplicate=$dupe \* | xargs -0 -I{} echo "{}"; done

1
0
0
@jani The most optimal would be something that would be just notmuch search query because :query command in aerc is able to do virtual folder of the messages. Then I could just add a binding to aerc.conf and would not need a script in the first place.
0
0
0

@jarkko Or, if you do this regularly followed by 'notmuch new', just removing the second dupe will eventually boil down to no dupes:

notmuch search --output=files --format=text0 --duplicate=2 \* | xargs -0 -I{} echo "{}"

All that said, I personally keep all duplicates, because they may have arrived via different mailing lists or directly, and may have subtly different content (e.g. mailman filters Cc's). All dupes have in common is the message-id, really.

1
0
1

@jani @jarkko not really, other than that I don't have any reliable way (not involving me reading both messages) to tell if two messages are really the same, or just have the same message-id because some "enterprise" system re-uses the same message-id for every message.

1
0
0

@bremner @jani I’m wondering here, did ~= in mutt do full body compare…

I’m also thinking that as aerc has this:

:query [-a <account>] [-n name] [-f] <notmuch query>
Create a virtual folder using the specified top-level notmuch query. 
This command is ex clusive to the notmuch backend. 

[man aerc]

Maybe one possibility would be feasible to implement directly into aerc :duplicate which would similarly create a virtual folder for duplicates, which would then allow interactively decide the faith (as it is for filter and query). This could potentially then do full body compare as it was within the implementation in Go.

1
0
0
@bremner @jani mutt just has a hash table of message ID's that it uses (pattern.c, thread.c and hash.c in its src tree).
1
0
0

@jarkko @jani sure. De-duplicating message-ids is easy. De-duplicating messages is hard, if you are worried about false positives.

1
0
0
@bremner @jani Right, in mutt it was nice because you could limit the view with ~= based on message ID's and quickly delete the most obvious ones and check manually rest.

So, can notmuch form a query that would be 1:1 match to what ~= does in mutt? In that case I can use that together with :query command in aerc.
2
0
0
@bremner @jani Anyway thanks for the comments! I now at least know what the problem is so this helped.
0
0
1

@jarkko @jani I think what Jani showed is the closest we get. It is unfortunately not part of the query language proper, so needs the CLI.

1
0
1
@bremner @jani I'm actually at least looking if I could PoC it as a feature to aerc code base. Have to grow some motivation first I've never used Go for anything ;-)

I did read the mutt implementation through and it is not really rocket science.

I also realized (based on this discussion) that the interactive flow in mutt was the thing (to address also false positives issue) and that's why I liked it.
1
0
0

@bremner @jani

OK, so aerc has {{.MessageId}}, which can be e.g. to :term b4 am {{.MessageId}}. In any command parser substitutes that with the message ID of currently selected message. It also has notmuch support.

So I should like a make notmuch query (which aerc should support) id:{{.MessageId}} and see if that gives me the set of messages with the same message ID.

0
0
0