[Imap-protocol] Thread extension

E-mail headers

From:	Timo Sirainen <tss@iki.fi>
To:	imap-protocol@u.washington.edu
Date:	Fri, 08 Jun 2018 12:34:37 -0000
Message-ID:	1143819932.8744.163.camel@localhost.localdomain permalink / raw / eml / mbox

The "sort threads by their first message's date" behavior is bad because
new messages can't be easily noticed in the mailbox. So, there should be
some way to specify "sort thread by their latest message's received
date" (and date-header?).

There has been talk about this before, but I guess no-one's done
anything about this? How about a new extension, eg. THREADEXT which
allows specifying the thread sorting in some way, eg:

1 thread (references) (internaldate) us-ascii all

Also another separate thing related to threads: Has anyone tried
optimizing the THREAD command in a way that after the initial threading,
the results would be stored somewhere and later when a new mail arrives,
the new thread list could be generated without having to rebuild
everything?

I haven't thought about this much yet, but I was hoping to find out a
way where I could just store the final thread reply and insert the new
mail in there with a single scan through all the messages in the mailbox
(looking at their message-id, in-reply-to, references and subject
headers but without actually having to store most of them to memory).

Reply

E-mail headers

From:	mrc@CAC.Washington.EDU
To:	imap-protocol@localhost
Date:	Fri, 08 Jun 2018 12:34:37 -0000
Message-ID:	Pine.OSX.4.64.0603310950110.26204@pangtzu.panda.com permalink / raw / eml / mbox

On Fri, 31 Mar 2006, Timo Sirainen wrote:
> The "sort threads by their first message's date" behavior is bad because
> new messages can't be easily noticed in the mailbox.

My MUA has a way to take me to new messages regardless of how they are 
ordered in the mailbox.  That might be a useful feature for your MUA.

> So, there should be
> some way to specify "sort thread by their latest message's received
> date" (and date-header?).
> There has been talk about this before, but I guess no-one's done
> anything about this? How about a new extension, eg. THREADEXT which
> allows specifying the thread sorting in some way

The way to define a new threading algorithm is...to define a new threading 
algorithm!  There is no need for THREADEXT that adds suboptions to an 
algorithm when you just just define a new THREAD=xxx.

There is already a very nasty comment in Wikipedia about IMAP's complexity 
(at least it now longer says "20 orders of magnitude more complex than 
POP3").  I don't think that we should increase complexity by adding 
sub-options to sub-options, which is effectively what adding options to 
modify threading does.

Put another way: if a way of threading is valuable enough, there is plenty 
of room in the namespace to define a new algorithm.  If it isn't valuable 
enough to define a new algorithm, then that cost serves as a desirable 
barrier to keep bloat out of the protocol.

> Also another separate thing related to threads: Has anyone tried
> optimizing the THREAD command in a way that after the initial threading,
> the results would be stored somewhere and later when a new mail arrives,
> the new thread list could be generated without having to rebuild
> everything?

People have tried to work out how to do insertion in THREAD results.  It 
is not a simple task.  A single message can completely change the thread 
results.  It's not just one link tying together two threads; a message can 
have references that tie together an unbounded number of links.

In fact, THREAD was what shot down an insertion mechanism for SORT, when 
it was pointed out that most users will thread rather than sort.

> I haven't thought about this much yet, but I was hoping to find out a
> way where I could just store the final thread reply and insert the new
> mail in there with a single scan through all the messages in the mailbox
> (looking at their message-id, in-reply-to, references and subject
> headers but without actually having to store most of them to memory).

The cost is not memory in the server.  The cost is the network I/O cost of 
transmitting the new thread results.

If you have an insight on how to express alterations to a thread tree that 
does not entail greater complexity that is saved, lots of people would be 
delighted to hear it (myself included).

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.

Reply

E-mail headers

From:	tss@iki.fi
To:	imap-protocol@localhost
Date:	Fri, 08 Jun 2018 12:34:37 -0000
Message-ID:	1143843283.3985.18.camel@hurina permalink / raw / eml / mbox

On Fri, 2006-03-31 at 10:08 -0800, Mark Crispin wrote:
> On Fri, 31 Mar 2006, Timo Sirainen wrote:
> > The "sort threads by their first message's date" behavior is bad because
> > new messages can't be easily noticed in the mailbox.
> 
> My MUA has a way to take me to new messages regardless of how they are 
> ordered in the mailbox.  That might be a useful feature for your MUA.

I like being able to see all the new mail at once without having to jump
around.

> > So, there should be
> > some way to specify "sort thread by their latest message's received
> > date" (and date-header?).
> > There has been talk about this before, but I guess no-one's done
> > anything about this? How about a new extension, eg. THREADEXT which
> > allows specifying the thread sorting in some way
> 
> The way to define a new threading algorithm is...to define a new threading 
> algorithm!  There is no need for THREADEXT that adds suboptions to an 
> algorithm when you just just define a new THREAD=xxx.

Well, that's fine with me too.

> > Also another separate thing related to threads: Has anyone tried
> > optimizing the THREAD command in a way that after the initial threading,
> > the results would be stored somewhere and later when a new mail arrives,
> > the new thread list could be generated without having to rebuild
> > everything?
> 
> People have tried to work out how to do insertion in THREAD results.  It 
> is not a simple task.  A single message can completely change the thread 
> results.  It's not just one link tying together two threads; a message can 
> have references that tie together an unbounded number of links.

Aren't most new messages simple insertions? Either a reply to an
existing message, or a new message with no message-id references. Those
cases could be optimized and the other cases could then fallback to
slower way.

> If you have an insight on how to express alterations to a thread tree that 
> does not entail greater complexity that is saved, lots of people would be 
> delighted to hear it (myself included).

If most changes are simple insertions, then I think it should be pretty
easy for those cases. And in case something else happens to the threads,
it would return everything again.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20060401/bd20f284/attachment.sig>

Reply

E-mail headers

From:	arnt@gulbrandsen.priv.no
To:	imap-protocol@localhost
Date:	Fri, 08 Jun 2018 12:34:37 -0000
Message-ID:	3qTXSmifRKnU9Fm0eQf0og.md5@libertango.oryx.com permalink / raw / eml / mbox

I wrote this sort of code six years ago, for an offline reader. I 
remember mrc telling me it was difficult, and then when I did it it was 
not much more than 400 lines of code.

First, you need some threading code. The algorithm I used originally 
came from trn, I think, although I pinched it from a friend of mine.

Once you have your threading algorithm (maybe half of my code back 
then), you've two options for persistence:

1. Store everything. Use something like the THREAD=REFERENCES algorithm, 
but leaving out steps 3 and 5, and make the main data structure 
persistent. This is what I did in 2000.

2. Just store thread membership, not position. This is simpler. Make 
your persistent data structure store a thread ID (e.g. an integer) and 
the messages it has seen belonging to that thread. It should be able to 
find the thread ID for a message-id quickly, and find all messages for 
a thread ID. Updating this structure at injection time is easy. At read 
time, you have to determine the detailed relationships within the 
thread.

Arnt

Reply

E-mail headers

From:	MRC@CAC.Washington.EDU
To:	imap-protocol@localhost
Date:	Fri, 08 Jun 2018 12:34:37 -0000
Message-ID:	Pine.WNT.4.65.0603311421060.4704@Tomobiki-Cho.CAC.Washington.EDU permalink / raw / eml / mbox

On Sat, 1 Apr 2006, Timo Sirainen wrote:
> I like being able to see all the new mail at once without having to jump
> around.

In that case, you don't want a threaded view.  You want a view that is in 
message sequence (or UID) order.

> Aren't most new messages simple insertions? Either a reply to an
> existing message, or a new message with no message-id references.

I don't think that's a good assumption, but I haven't done a thorough 
analysis.

> If most changes are simple insertions, then I think it should be pretty
> easy for those cases.

Not as easy as it sounds.  The server must rethread the entire tree; the 
message could have been received out of order and thus has pre-existing 
messages that refer to it.  Then the server must compare the new tree with 
the old tree.  Only then will it know if it is "just" a simple insertion.

Then, a means has to be defined by which the insertion is represented.  Do 
you do something like message part section numbers?  Or do you send a 
piece of tree.

Then it has to be defined how the client does this.  And all of this has 
to consider the various ways in which a tree can branch and where the 
insertion takes place.

It's certainly possible, but it's not easy.

> And in case something else happens to the threads,
> it would return everything again.

Communication links are getting faster all the time.  Against the ever 
dwindling savings, you have to weigh the costs involved in getting a diff 
mechanism defined and debugged (this includes getting multiple 
implementations to interoperate correctly).  We have a difficult time 
getting multiple implementations of much simpler protocols to 
interoperate.

Remember, just because we are technically competant enough to get such a 
mechanism working does not mean that everybody who implements the protocol 
will have the same competance.  Past experience has made me quite 
pessimistic.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.

Reply

E-mail headers

From:	tss@iki.fi
To:	imap-protocol@localhost
Date:	Fri, 08 Jun 2018 12:34:37 -0000
Message-ID:	1143847047.3985.42.camel@hurina permalink / raw / eml / mbox

On Fri, 2006-03-31 at 14:37 -0800, Mark Crispin wrote:
> On Sat, 1 Apr 2006, Timo Sirainen wrote:
> > I like being able to see all the new mail at once without having to jump
> > around.
> 
> In that case, you don't want a threaded view.  You want a view that is in 
> message sequence (or UID) order.

Nope, it would still be nice to see the threads. I like pretty much how
OSX's Mail does it, except it uses ordererdsubject-like threading.

> > Aren't most new messages simple insertions? Either a reply to an
> > existing message, or a new message with no message-id references.
> 
> I don't think that's a good assumption, but I haven't done a thorough 
> analysis.

Here's an ugly analyzer script, I think it works correctly:

count=`printf "x select inbox\n"|./imap|grep EXISTS|sed 's/^\* \([0-9]*\) .*/\1/'`
echo "$count messages, threading.."
perl -e 'printf("x delete test\nx create test\n"); for ($i = 1; $i <= $ARGV[0]; $i++) { printf("x select inbox\nx copy $i test\nx select test\nx thread references us-ascii all\n"); }' $count|./imap > log
grep '^\* THREAD ' log|sed 's/^\* THREAD //'|tr -d '\r' > log2
echo "Out of $count new messages, there were this many non-insertions:"
perl -e '$prev = " "; $num = 1; while (<>) { s/[() \n]+/ /g; $new = $_; $new =~ s/$num //; if ($new ne $prev) { print "$new vs $prev\n"; } $prev = $_; $num++; }' < log2|wc -l

With my 787 message INBOX it found one non-insertion and with a 2000
mail mailing list it found 3 non-insertions.

> > If most changes are simple insertions, then I think it should be pretty
> > easy for those cases.
> 
> Not as easy as it sounds.  The server must rethread the entire tree; the 
> message could have been received out of order and thus has pre-existing 
> messages that refer to it.  Then the server must compare the new tree with 
> the old tree.  Only then will it know if it is "just" a simple insertion.

Right. Although I think it's possible to do some optimizations in there
also, eg. if message has no message-id, in-reply-to or references
headers, and there's no existing message with same base subject, it can
always be inserted as a root node.

> Then, a means has to be defined by which the insertion is represented.  Do 
> you do something like message part section numbers?  Or do you send a 
> piece of tree.
> 
> Then it has to be defined how the client does this.  And all of this has 
> to consider the various ways in which a tree can branch and where the 
> insertion takes place.

I think two types of insertions would be enough: "insert as sibling to
message n" and "insert as the first child to message n". Root would
probably be message number 0. So an example reply would be:

* THREAD + 5=S3 6=C5

Meaning message 5 was added as a sibling to 3, and message 6 was added
as 5's first child.

I think that should be easy enough for the clients to handle?

> > And in case something else happens to the threads,
> > it would return everything again.
> 
> Communication links are getting faster all the time.  Against the ever 
> dwindling savings, you have to weigh the costs involved in getting a diff 
> mechanism defined and debugged (this includes getting multiple 
> implementations to interoperate correctly).  We have a difficult time 
> getting multiple implementations of much simpler protocols to 
> interoperate.
> 
> Remember, just because we are technically competant enough to get such a 
> mechanism working does not mean that everybody who implements the protocol 
> will have the same competance.  Past experience has made me quite 
> pessimistic.

Well, I'm still mostly just interested in optimizing this internally in
my server so that the reply gets sent near instantly, but these are
somewhat related goals in that if I can figure out rules how to quickly
update the thread list internally, I can easily send out incremental
updates to clients as well.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20060401/5dae1623/attachment.sig>

Reply

E-mail headers

From:	arnt@gulbrandsen.priv.no
To:	imap-protocol@localhost
Date:	Fri, 08 Jun 2018 12:34:37 -0000
Message-ID:	JmYWt2G05c6KmgHRXc0OKQ.md5@libertango.oryx.com permalink / raw / eml / mbox

Timo Sirainen writes:
> With my 787 message INBOX it found one non-insertion and with a 2000 
> mail mailing list it found 3 non-insertions.

Let me guess: Mailing list threads where some thread participants are 
cc'd, and mail servers delivering several messages in a thread 
practically at the same time when a network connection comes up after 
downtime.

Arnt

Reply

E-mail headers

From:	MRC@CAC.Washington.EDU
To:	imap-protocol@localhost
Date:	Fri, 08 Jun 2018 12:34:37 -0000
Message-ID:	Pine.WNT.4.65.0603311522280.4704@Tomobiki-Cho.CAC.Washington.EDU permalink / raw / eml / mbox

On Sat, 1 Apr 2006, Timo Sirainen wrote:
> I think two types of insertions would be enough: "insert as sibling to
> message n" and "insert as the first child to message n".

You need to define the semantics when UID THREAD is used.  Also, you need 
to define what happens if the new message has children (insertion as 
opposed to append).

> I think that should be easy enough for the clients to handle?

You'd be surprised at what people find to be "not easy".

> Well, I'm still mostly just interested in optimizing this internally in
> my server so that the reply gets sent near instantly, but these are
> somewhat related goals in that if I can figure out rules how to quickly
> update the thread list internally, I can easily send out incremental
> updates to clients as well.

I don't understand.  A thread should be nearly instantaneous, even for a 
very large mailbox, if you use a hash table.  It's probably also a good 
idea to cache the data needed to thread so it's available the next time 
you thread.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.

Reply

E-mail headers

From:	chappa@math.washington.edu
To:	imap-protocol@localhost
Date:	Fri, 08 Jun 2018 12:34:37 -0000
Message-ID:	Pine.LNX.4.64.0603312304140.29083@zeno3.math.washington.edu permalink / raw / eml / mbox

Timo,

  I have a patch for Pine that implements this idea for Pine by modifying 
the client, not the server. What it done is to sort by Arrival and by 
thread, then take the thread containing the last message in sort by 
arrival and put that thread in the bottom of the list, then remove both 
threads from both lists (threaded and by arrival) and repeat until we are 
done. I call this sort the thread by arrival.

  Replacing arrival sort by subject sort is also very useful, but it does 
not seem to be very useful for other sort methods (e.g. From, or size).

  You can find the patch in my web page (see below). Look for "Enhanced 
Fancy Thread Interface".

-- 
Eduardo
http://www.math.washington.edu/~chappa/pine/

Reply

E-mail headers

From:	dave@cridland.net
To:	imap-protocol@localhost
Date:	Fri, 08 Jun 2018 12:34:37 -0000
Message-ID:	14820.1143878996.313026@peirce.dave.cridland.net permalink / raw / eml / mbox

On Sat Apr  1 00:30:03 2006, Mark Crispin wrote:
> On Sat, 1 Apr 2006, Timo Sirainen wrote:
>> I think two types of insertions would be enough: "insert as 
>> sibling to
>> message n" and "insert as the first child to message n".
> 
> You need to define the semantics when UID THREAD is used.  Also, 
> you need to define what happens if the new message has children 
> (insertion as opposed to append).
> 
> 
I vaguely suspect that rather than single identifiers (sequence 
numbers or UIDs), you want tree fragments. That handles both "gap 
plugging", where a message arrives out of order (and I see a few of 
those on some mailing lists, especially where there's a mixture of 
replying direct to me and only replying to the list going on), and 
actual "grafting", where two previously unrelated trees are joined.

It also means you can unify most insertions down to a tree fragment 
by the algorithm of finding the insertion point, and sending the tree 
fragment starting at the parent, and ending at the first identified 
descendent node. I'd imagine most server implementations would give 
up and resend the whole tree in any complex cases.


>> I think that should be easy enough for the clients to handle?
> 
> You'd be surprised at what people find to be "not easy".
> 
> 
True, but there's precious few clients which implement much more than 
LITERAL+, CHILDREN, and perhaps one or two more. Better, these 
messages are at least clear in their intent by a reasonable 
application of common sense. ACL is fairly common, but I could only 
find one client-side implementation of LISTRIGHTS.


>> Well, I'm still mostly just interested in optimizing this 
>> internally in
>> my server so that the reply gets sent near instantly, but these are
>> somewhat related goals in that if I can figure out rules how to 
>> quickly
>> update the thread list internally, I can easily send out 
>> incremental
>> updates to clients as well.
> 
> I don't understand.  A thread should be nearly instantaneous, even 
> for a very large mailbox, if you use a hash table.  It's probably 
> also a good idea to cache the data needed to thread so it's 
> available the next time you thread.

JWZ's description of speeds on now fairly ancient hardware would 
suggest that. In any case, optimizing to the level Mark suggests 
would be a clearly useful first step.

Dave.
-- 
           You see things; and you say "Why?"
   But I dream things that never were; and I say "Why not?"
    - George Bernard Shaw

Reply

mailing list archives

[Imap-protocol] Thread extension