mailing list archives

meli community discussions

⚠️ if something does not work as intended when interracting with the mailing lists,
reach out Github mirror Gitea repo @epilys:matrix.org

E-mail headers
From: Bill Janssen <janssen@parc.com>
To: imap-protocol@u.washington.edu
Date: Fri, 08 Jun 2018 12:34:39 -0000
Message-ID: 07Apr9.093358pdt."57996"@synergy1.parc.xerox.com permalink / raw / eml / mbox
I'm looking at the issues of supporting THREAD, and looking at
http://tools.ietf.org/wg/imapext/draft-ietf-imapext-sort/, there
appear to be two algorithms, ORDEREDSUBJECT and REFERENCES.
REFERENCES includes a flavor of ORDEREDSUBJECT as an element of its
algorithm.

I'm looking at a collection of email mostly sent from Exchange
servers, and notice that it by and large does not support the
"References" or "In-Reply-To" headers, but does always contain headers
called "Thread-Topic" and "Thread-Index".  The "Thread-Index" field
seems to contain a BASE64 string which contains the "Thread-Index" of
its parent message as a prefix.

Has anyone tried to understand or reverse-engineer this Exchange
information to describe a thread computation algorithm that works
better for messages with this information?

Bill
Reply
E-mail headers
From: mrc@CAC.Washington.EDU
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:39 -0000
Message-ID: alpine.OSX.0.98.0704090939120.11423@pangtzu.panda.com permalink / raw / eml / mbox
On Mon, 9 Apr 2007, Bill Janssen wrote:
> Has anyone tried to understand or reverse-engineer this Exchange
> information to describe a thread computation algorithm that works
> better for messages with this information?

I'm not aware of any such effort.  IMHO it would be pointless to do so; 
those Exchange headers are non-standard and proprietary, and IETF protocol 
work is focused on non-proprietary standards.

What's more, if Microsoft had intended for others to use these headers, 
they would have submitted them to the standards process.  Microsoft has 
shown many times that it knows how to do so, is capable of doing so, and 
will do so.  We can only conclude that Microsoft considers these headers 
to be for their internal use only, and that the semantics of those headers 
can be changed at their whim at any time.  That's the nature of 
non-standard features.

The In-Reply-To header has been a standard since the 1970s.  References is 
a "nice to have" but is not strictly necessary for threading (it just 
makes threading easier, particularly if there are missing messages in the 
thread).  I strongly urge you to focus on standards, and disregard 
non-standards.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Reply
E-mail headers
From: johannes@sipsolutions.net
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:39 -0000
Message-ID: 1176149558.8459.40.camel@johannes.berg permalink / raw / eml / mbox
On Mon, 2007-04-09 at 09:33 -0700, Bill Janssen wrote:

> I'm looking at a collection of email mostly sent from Exchange
> servers, and notice that it by and large does not support the
> "References" or "In-Reply-To" headers, but does always contain headers
> called "Thread-Topic" and "Thread-Index".  The "Thread-Index" field
> seems to contain a BASE64 string which contains the "Thread-Index" of
> its parent message as a prefix.
> 
> Has anyone tried to understand or reverse-engineer this Exchange
> information to describe a thread computation algorithm that works
> better for messages with this information?

Look at the evolution source code, it contains quite a bit of
information on this.

johannes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20070409/1fce591f/attachment.sig>
Reply
E-mail headers
From: janssen@parc.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:39 -0000
Message-ID: 07Apr9.101403pdt."57996"@synergy1.parc.xerox.com permalink / raw / eml / mbox
> I strongly urge you to focus on standards, and disregard 
> non-standards.

Ah, but identifying what's really a standard is the hard part. :-)

Bill
Reply
E-mail headers
From: mrc@CAC.Washington.EDU
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:39 -0000
Message-ID: alpine.OSX.0.98.0704091015270.11423@pangtzu.panda.com permalink / raw / eml / mbox
On Mon, 9 Apr 2007, Bill Janssen wrote:
>> I strongly urge you to focus on standards, and disregard
>> non-standards.
> Ah, but identifying what's really a standard is the hard part. :-)

Actually, it is not difficult at all within the Internet context.

The IETF has a set of documents, called RFCs for historical reasons, which 
are graded according to their standards level: informational, 
experimental, proposed standard, draft standard, full standard.  Documents 
in one of the last three categories (with "standard" in their name) are 
also called standards-track documents.

Anything that is not in a standards-track document is not a standard.

There are also Internet Drafts, some of which are destined to become 
standards-track documents.  However, Internet Drafts can (and do!) change 
incompatibly before they are published as RFCs, so even if the effort 
itself is aimed at standards-track an Internet Draft can not be used or 
cited as anything other than a work in progress.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Reply
E-mail headers
From: janssen@parc.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:39 -0000
Message-ID: 07Apr9.104129pdt."57996"@synergy1.parc.xerox.com permalink / raw / eml / mbox
> On Mon, 9 Apr 2007, Bill Janssen wrote:
> >> I strongly urge you to focus on standards, and disregard
> >> non-standards.
> > Ah, but identifying what's really a standard is the hard part. :-)
> 
> Actually, it is not difficult at all within the Internet context.

Yeah, but...  The (expired?) document at
http://tools.ietf.org/html/draft-ietf-imapext-sort-19 is attempting to
standardize two hacks that let possessors of a quantity of mail
organize it into threads that a human may recognize.  While both of
these hacks are reasonable given the kinds of information usually
found in email headers in non-corporate environments, they are also
unreasonably fragile to the extent they are based on comparison of
"Subject" header strings, which are typically presented in an MUA in
an edit window for the responder to mung at will.

However, in many email messages, there is additional machine-injected
information, not exposed to the whim of the human user, which may aid
in the determination of threads which actually make sense to the human
user.  In particular, it may be possible to exploit this information
in order to correct for some of the fragility introduced by comparing
the "Subject" header.  It doesn't seem out of the question for someone
to try to standardize a third or fourth or fifth hack based on this
information to add to the THREAD extension.  Given that vast amounts
of corporate email may include these headers, might be worth looking
into.

Bill
Reply
E-mail headers
From: mrc@CAC.Washington.EDU
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:39 -0000
Message-ID: alpine.OSX.0.98.0704091048010.11423@pangtzu.panda.com permalink / raw / eml / mbox
On Mon, 9 Apr 2007, Bill Janssen wrote:
> Yeah, but...  The (expired?) document at
> http://tools.ietf.org/html/draft-ietf-imapext-sort-19

That document is NOT expired, HAS been approved for publication, and is 
blocked for publication pending the IMAP internationalization document 
which is in its final stages.

> is attempting to
> standardize two hacks

I take offense at that comment.

> that let possessors of a quantity of mail
> organize it into threads that a human may recognize.  While both of
> these hacks are reasonable given the kinds of information usually
> found in email headers in non-corporate environments, they are also
> unreasonably fragile to the extent they are based on comparison of
> "Subject" header strings, which are typically presented in an MUA in
> an edit window for the responder to mung at will.

Within the IMAP context, only the sender of a message may alter the 
Subject.  They typically do this for a reason; a reason that is generally 
not capricious, and with full knowledge of the impact of the change upon 
the receipient's MUA.  Subjects are used as primary mechanisms only by 
those sorting and threading algorithms that are named to use Subject.

The THREAD=REFERENCES mechanism uses Subjects only as a backup, to join 
threads that are likely (albeit not certainly) to be related but had their 
linkages broken.

What you are suggesting is that this is to be abandoned in favor of some 
arbitrary set of headers defined by Microsoft, but neither documented nor 
submitted to the open standards community for consideration.

I'll leave aside your comment about "non-corporate environments"; but 
please take the time to research the history and traditions of the IETF 
before making such comments in the future.

> However, in many email messages, there is additional machine-injected
> information, not exposed to the whim of the human user, which may aid
> in the determination of threads which actually make sense to the human
> user.

As the children say, "well, duh!"

There is such a mechanism.  It was defined in the days when Bill Gates was 
an undergraduate at Harvard illicitly hacking on his BASIC interpreter on 
their PDP-10.  The THREAD=REFERENCES algorithm uses this mechanism.

> In particular, it may be possible to exploit this information
> in order to correct for some of the fragility introduced by comparing
> the "Subject" header.  It doesn't seem out of the question for someone
> to try to standardize a third or fourth or fifth hack based on this
> information to add to the THREAD extension.  Given that vast amounts
> of corporate email may include these headers, might be worth looking
> into.

Non-standard proprietary headers do not belong in standards documents. 
If there is data in such headers that can not be found in the equivalent 
standard headers, the fix for the problem is to induce the vendor to use 
the standard headers.

The fix is NOT to "reverse engineer" Microsoft's, or any other other 
vendor's, undocumented proprietary headers.  That is asking for trouble.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Reply
E-mail headers
From: janssen@parc.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:39 -0000
Message-ID: 07Apr9.123954pdt."57996"@synergy1.parc.xerox.com permalink / raw / eml / mbox
> On Mon, 9 Apr 2007, Bill Janssen wrote:
> > Yeah, but...  The (expired?) document at
> > http://tools.ietf.org/html/draft-ietf-imapext-sort-19
> 
> That document is NOT expired, HAS been approved for publication, and is 
> blocked for publication pending the IMAP internationalization document 
> which is in its final stages.

Excellent, thanks.  I found that hard to determine from the WG page.

> > is attempting to standardize two hacks
> 
> I take offense at that comment.

I'm sorry to hear that, as I meant no offense.  The two algorithms
described in the document are in fact effective ad-hoc ways of
grouping collections of email messages into threads, designed based on
observation of typical collections and knowledge of various email
clients, standards, and delivery mechanisms.  Nothing wrong with that,
given that an email "thread" is not a particularly well-defined
concept (though most users think they know what it means).  The very
names associated with the algorithms ("poor man's threading" and "used
in 'Netscape Mail and News' versions 2.0 through 3.0") seem (perhaps
only to me) to identify them as a particular class of algorithm.

> Within the IMAP context, only the sender of a message may alter the 
> Subject.  They typically do this for a reason; a reason that is generally 
> not capricious,...

I think you must be working with a different set of users than I'm
working with.  I find seemingly arbitrary and capricious changes to
the "Subject" header over and over again in my data sets.  Usually
with a small edit distance from the original subject, but with odd
patterns -- usually capitalization or punctuation changes.  That's why
I'm looking for a backup to the "Subject"-based backup.

I guess I should also go back and review my code that calculates the
"base subject".  Perhaps I've fat-fingered something.

> ...and with full knowledge of the impact of the change upon 
> the receipient's MUA.

Hmmmm.  I doubt that most users of email have an accurate model of the
effect of various changes on the recipient's MUA, particularly the
effect on something as loosey-goosey as message threading.  I'd be
happy to be pointed to the results of user studies that show I'm
wrong, though.

> The THREAD=REFERENCES mechanism uses Subjects only as a backup, to join 
> threads that are likely (albeit not certainly) to be related but had their 
> linkages broken.
> 
> What you are suggesting is that this is to be abandoned in favor of some 
> arbitrary set of headers defined by Microsoft, but neither documented nor 
> submitted to the open standards community for consideration.

I wasn't suggesting it be abandoned.  I was suggesting that it be
augmented by introducing additional "backup" techniques that could
further, and perhaps more accurately, discover threads that have had
their linkages broken.

> Non-standard proprietary headers do not belong in standards documents. 

I have a great deal of sympathy for this viewpoint (though I'm not
sure about the proprietary part; there seem to be various IANA
registries that keep track of proprietary mechanisms).

> If there is data in such headers that can not be found in the equivalent 
> standard headers, the fix for the problem is to induce the vendor to use 
> the standard headers.

I think that's a fine theoretical solution; I'm less sure anyone knows how
to "induce" this particular vendor to do that.

> The fix is NOT to "reverse engineer" Microsoft's, or any other other 
> vendor's, undocumented proprietary headers.  That is asking for trouble.

Sure, OK.  I'm just trying to make my IMAP server provide threading
that users will perceive as being "as good" as that provided by the
Exchange server.  I guess the right thing to do would be to figure out
the algorithm and submit it to the threading algorithms registry.

Bill
Reply