mailing list archives

meli community discussions

⚠️ if something does not work as intended when interracting with the mailing lists,
reach out Github mirror Gitea repo @epilys:matrix.org

E-mail headers
From: Ashley Clark <aclark@ghoti.org>
To: imap-protocol@u.washington.edu
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: 8F0DA5FA-FB07-4AFA-9C58-8F0927998343@ghoti.org permalink / raw / eml / mbox
For a message with this body structure:

3 uid fetch 1304 bodystructure
* 23 FETCH (UID 1304 BODYSTRUCTURE (("TEXT" "PLAIN" ("CHARSET" "UTF-8") NIL NIL "QUOTED-PRINTABLE" 1527 42 NIL NIL NIL)("TEXT" "HTML" ("CHARSET" "UTF-8") NIL NIL "QUOTED-PRINTABLE" 5510 85 NIL NIL NIL) "ALTERNATIVE" ("BOUNDARY" "372894113577999") NIL NIL))
3 OK UID FETCH Completed


One server returns this for a UID FETCH BODY.PEEK[]<> 

4 uid fetch 1304 body.peek[2]<0.128>
* 23 FETCH (UID 1304 BODY[2]<3456> {128}

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"  "http://w=
)w.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4 OK UID FETCH Completed


Note, the requested origin was 0, yet the returned origin was 3456 (which is the origin of the part in the whole message).


This same message when requested from another server implementation returns this:

4 uid fetch 13281 body.peek[2]<0.128>
* 693 FETCH (UID 13281 BODY[2]<0> {128}

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"  "http://w=
)w.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4 OK Fetch completed.


Here the origin in the server response matches the client's request.


Are one, none or both of these server implementations correct?


Thanks,
Ashley
Reply
E-mail headers
From: mrc+imap@panda.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: alpine.OSX.2.00.1110301823210.9034@hsinghsing.panda.com permalink / raw / eml / mbox
My recommendation is as follows:

See what happens in one or more the three open-source IMAP servers of
recognized quality: UW IMAP (or Panda IMAP if you have access to it),
Cyrus, and Dovecot. These servers almost always agree with each other on
matters of IMAP protocol (and if they disagree their developers will want
to know; they go to great efforts on this).

It is a safe assumption that what they do is correct; and that if some
other server does something differently, that other server is incorrect.

It is also a safe assumption that Gmail, Yahoo, Exchange, Apple iCloud
(iSCREAM), etc. implement IMAP incorrectly. Those companies don't care
about implementing IMAP correctly; they only care about having something
that talks to Outlook. If they cared, they would implement it correctly
and hire someone competent to make it work. They don't.

If, having done this, you still have questions as to why it is one way and
not the other, then ask this list.

I think that, in the case of your question, you already know the answer
and merely seek confirmation. I'm just telling you how you can do so on
your own.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Reply
E-mail headers
From: aclark@ghoti.org
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: 1D4C80A2-BD6A-4888-9A21-29D746896D97@ghoti.org permalink / raw / eml / mbox
The server responding with the matching 0 origin was running Dovecot. I don't know the implementation of the other server and don't have ready access to UW or Cyrus.

It seems that the origin must be relative to the start of the part that is being requested in both the request and response but I didn't see it clearly stated in the RFC. Is this a safe assumption?


Ashley
Reply
E-mail headers
From: David.Harris@pmail.gen.nz
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: 4EAE0BBB.19406.5D99DC56@David.Harris.pmail.gen.nz permalink / raw / eml / mbox
On 30 Oct 2011 at 18:33, Mark Crispin wrote:

> See what happens in one or more the three open-source IMAP servers
> of recognized quality: UW IMAP (or Panda IMAP if you have access to
> it), Cyrus, and Dovecot. These servers almost always agree with
> each other on matters of IMAP protocol (and if they disagree their
> developers will want to know; they go to great efforts on this). 

Just FWIW... Although my Mercury is a very minor player in this 
market, I am also *very* keen to hear of any situations where I am not 
compliant. My IMAP code has been around for a long time, but has 
tended to be hampered by being tied to an ancient and restrictive 
message store, a problem I have just about resolved with a new, 
industrial-strength back end.

With that in place, I am keen to improve my general compliance: I 
know for sure there are one or two places where I'm not (in particular 
in some of the more exotic areas of SEARCH), but will actively take 
on board any politely-worded criticism or feedback.

Cheers!

-- David --

------------------ David Harris -+- Pegasus Mail ----------------------
Box 5451, Dunedin, New Zealand | e-mail: 
David.Harris@pmail.gen.nz
           Phone: +64 3 453-6880 | Fax: +64 3 453-6612

On the menu of a Belgian cafe:
   "Cream dognuts."
Reply
E-mail headers
From: blong@google.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: CABa8R6uyrHJ6AqQoGfMcLztG-zovaVK4FEFep8ibpYc2-6b7ig@mail.gmail.com permalink / raw / eml / mbox
On Sun, Oct 30, 2011 at 6:33 PM, Mark Crispin <mrc+imap@panda.com> wrote:
> It is also a safe assumption that Gmail, Yahoo, Exchange, Apple iCloud
> (iSCREAM), etc. implement IMAP incorrectly. Those companies don't care
> about implementing IMAP correctly; they only care about having something
> that talks to Outlook. If they cared, they would implement it correctly
> and hire someone competent to make it work. They don't.

tsk tsk, haven't I said this before?  Outlook isn't our primary at
all.  For desktops, Apple Mail and Thunderbird are far more common,
but all desktop clients are far outstripped by mobile clients.  I
imagine this is the case for most of the web mail providers, and
probably not the case for other servers.

Though, I kind of doubt that Outlook is the primary client for
Exchange IMAP either, they probably use MAPI.

(and we answer this one correctly too, fyi)

Brandon
-- 
?Brandon Long <blong@google.com>
?Staff Engineer
?Gmail Delivery TLM
Reply
E-mail headers
From: mrc+imap@panda.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: alpine.OSX.2.00.1110302053130.9034@hsinghsing.panda.com permalink / raw / eml / mbox
On Sun, 30 Oct 2011, Ashley Clark wrote:
> The server responding with the matching 0 origin was running Dovecot. I
> don't know the implementation of the other server and don't have ready
> access to UW or Cyrus.

Well, I think that you have gotten your answer. I can tell you that Panda
(nee UW) does the same, and I am certain that Cyrus does.

> It seems that the origin must be relative to the start of the part that
> is being requested in both the request and response but I didn't see it
> clearly stated in the RFC. Is this a safe assumption?

Yes. And if you think about it, no other interpretation makes any sense at
all. The IMAP specification was not developed capriciously; everything in
IMAP has a reason why it is that way. In a few matters of syntax, it was
because of compatibility with the past. But in the case of a partial
specifier, the only thing that could possibly be useful is the offset from
the start of the part - particularly when you consider asynchronous
multiple partial fetches and subseqeunt reassembly.

In general, you will notice in IMAP that the types in fetch responses
mirror what was requested. That is by design.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Reply
E-mail headers
From: mrc+imap@panda.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: alpine.OSX.2.00.1110311352000.9034@hsinghsing.panda.com permalink / raw / eml / mbox
On Mon, 31 Oct 2011, Brandon Long wrote:
> tsk tsk, haven't I said this before?  Outlook isn't our primary at
> all.

I was told specifically, by Google, that Google does not care at all about
compliance with the IMAP RFCs, but rather that Outlook using IMAP behaved
like Gmail. I was also told specifically that compliance and/or
interoperability is NOT a priority at Google.

If you think that Gmail's IMAP server is compliant, you are mistaken.

Should Google's corporate goals ever change to incorporate compliance and
interoperability, I wish you luck on making it compliant. You have a lot
of work, and a great deal of testing, ahead of you. You also face some
very difficult design decisions.

FWIW, "it works with Apple Mail and Thunderbird" does not constitute
testing. Not by a long shot.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Reply
E-mail headers
From: brong@fastmail.fm
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: 1320096826.9022.140660992904397@webmail.messagingengine.com permalink / raw / eml / mbox
On Monday, October 31, 2011 2:12 PM, "Mark Crispin" <mrc+imap@panda.com> wrote:
> I was told specifically, by Google, that Google does not care at all about
> compliance with the IMAP RFCs, but rather that Outlook using IMAP behaved
> like Gmail. I was also told specifically that compliance and/or
> interoperability is NOT a priority at Google.

I can understand their perspective too - for all that IMAP itself may have
been designed for good reasons, and made good compromises for what was needed,
the sum total of every RFC that extends IMAP is a pretty crufty awful mishmash.

I'm actually surprised that no serious attempt has been made to create a
competing protocol that's simpler to implement, at both ends.  I suspect
Google could do it themselves with a bit of cooperation from Thunderbird.

The really filthy way to do it, of course, would be to just straight out
subvert IMAP4 by offering a "capability", which - if present - allowed the
client to send "ENABLE PROTO-X" and then switch totally to talking the
other protocol - allowing a "legacy fallback" mode for clients and servers
that didn't offer it - then add new features just with the new protocol
as it slowly took over.

It would need some killer feature of course, to make it worth the effort
of supporting it.  And something not patented - totally free to implement
or it would have no advantage over just licensing ActiveSync.

Bron.
-- 
  Bron Gondwana
  brong@fastmail.fm
Reply
E-mail headers
From: blong@google.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: CABa8R6ugaAADsrvaS1Nnau5bi7k9728b+x00NcG1Nbxfk9xwsA@mail.gmail.com permalink / raw / eml / mbox
On Mon, Oct 31, 2011 at 2:12 PM, Mark Crispin <mrc+imap@panda.com> wrote:
> On Mon, 31 Oct 2011, Brandon Long wrote:
>>
>> tsk tsk, haven't I said this before? ?Outlook isn't our primary at
>> all.
>
> I was told specifically, by Google, that Google does not care at all about
> compliance with the IMAP RFCs, but rather that Outlook using IMAP behaved
> like Gmail. I was also told specifically that compliance and/or
> interoperability is NOT a priority at Google.

I have no idea who you talked to, but my team owns the Gmail IMAP server.

As for Outlook, we have a separate piece for that, since we had issues
with their IMAP implementation:

http://www.google.com/apps/intl/en/business/outlook_sync.htm

Plus, that has the benefit of giving Outlook contacts & calendar sync as well.

As for "behaving like Gmail", we have that issue most definitely.
Almost all of our IMAP users mostly use the web interface, so yes,
they want the IMAP experience to mimic the Gmail interface.  There are
settings that can be set to make the experience more "IMAP normal"
than "Gmail normal", but the defaults favor the more common use case.

> If you think that Gmail's IMAP server is compliant, you are mistaken.

I know its not compliant, we even have a support page where we list
the cases where we explicitly decided against compliance.
Interoperability is a goal, however.  That doesn't mean we want to
force Gmail users to use the IMAP mailbox model, however.

> Should Google's corporate goals ever change to incorporate compliance and
> interoperability, I wish you luck on making it compliant. You have a lot
> of work, and a great deal of testing, ahead of you. You also face some
> very difficult design decisions.

If compliance was important, one would think there would be a fairly
comprehensive suite of tests that one could use to measure that.  The
existing test suite does not meet that goal.

And I argue that we made the design decisions we did with good
reasons.  I also argue that there were no perfect decisions to be
made, and usability was a more important goal than correctness.

> FWIW, "it works with Apple Mail and Thunderbird" does not constitute
> testing. Not by a long shot.

The real world is a harsh place.

Brandon
-- 
?Brandon Long <blong@google.com>
?Staff Engineer
?Gmail Delivery TLM
Reply
E-mail headers
From: mrc+imap@panda.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: alpine.OSX.2.00.1110311436170.9034@hsinghsing.panda.com permalink / raw / eml / mbox
On Mon, 31 Oct 2011, Bron Gondwana wrote:
> I can understand their perspective too - for all that IMAP itself may
> have been designed for good reasons, and made good compromises for what
> was needed, the sum total of every RFC that extends IMAP is a pretty
> crufty awful mishmash.

So implement the base specification, which is not at all crufty and is for
the most part all that is actually needed. Just implement it correctly.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Reply
E-mail headers
From: blong@google.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: CABa8R6uckusexAOtq-1-PHd58Sk7Xi0GOwLtwn1HMfiAA52C9w@mail.gmail.com permalink / raw / eml / mbox
On Mon, Oct 31, 2011 at 2:33 PM, Bron Gondwana <brong@fastmail.fm> wrote:
> On Monday, October 31, 2011 2:12 PM, "Mark Crispin" <mrc+imap@panda.com> wrote:
>> I was told specifically, by Google, that Google does not care at all about
>> compliance with the IMAP RFCs, but rather that Outlook using IMAP behaved
>> like Gmail. I was also told specifically that compliance and/or
>> interoperability is NOT a priority at Google.
>
> I can understand their perspective too - for all that IMAP itself may have
> been designed for good reasons, and made good compromises for what was needed,
> the sum total of every RFC that extends IMAP is a pretty crufty awful mishmash.

I'd argue the larger problem with the extensions is that who knows who
implements them.  For clients, the issue is supporting both the old
way and the new way without knowing whether or not its worth
supporting the new way.  For servers, its impossible to know what the
clients even support.  One can argue that large implementations like
Gmail have some benefit here, as folks on both the clients and servers
directly collaborate to implement the extensions on both sides (I'm
talking existing extensions, not making up new ones).   Ie, we
implemented COMPRESS when we saw that Thunderbird supported it (and
used their implementation to fix bugs in ours) and then when Apple saw
we supported it, they added it to iOS (I'm guessing, but that's what
it looked like).

For us, it would have been very beneficial if the CAPABILITY command
had the client telling us which extensions they supported, so we could
better focus our efforts on the extensions most likely to be useful.

> I'm actually surprised that no serious attempt has been made to create a
> competing protocol that's simpler to implement, at both ends. ?I suspect
> Google could do it themselves with a bit of cooperation from Thunderbird.

I wouldn't claim to be able to create something better.  I could make
something simpler with a simpler use case.  The argument may be that
IMAP tries to do too much, that a two-way syncing protocol would be
simpler and what most clients these days would use in preference to
the "on-line, off load work to server" model that say pine wants.  Or,
that could just be my biases coloring my perception.

Brandon
-- 
?Brandon Long <blong@google.com>
?Staff Engineer
?Gmail Delivery TLM
Reply
E-mail headers
From: tss@iki.fi
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: 75E51A44-91E4-48D1-BA82-789C82583ABC@iki.fi permalink / raw / eml / mbox
On 31.10.2011, at 23.53, Brandon Long wrote:

>> If you think that Gmail's IMAP server is compliant, you are mistaken.
> 
> I know its not compliant, we even have a support page where we list
> the cases where we explicitly decided against compliance.
> Interoperability is a goal, however.  That doesn't mean we want to
> force Gmail users to use the IMAP mailbox model, however.

I haven't tested GMail with imaptest for a while now (few years?), but I don't think the bugs it reports are about the mailbox model, or anything else that can't be fixed somewhat easily. I guess the one test I could remove is the search for substrings, since even Mark doesn't think it's worth the trouble nowadays.
Reply
E-mail headers
From: mrc+imap@panda.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: alpine.OSX.2.00.1110311455120.9034@hsinghsing.panda.com permalink / raw / eml / mbox
On Mon, 31 Oct 2011, Brandon Long wrote:
> I have no idea who you talked to, but my team owns the Gmail IMAP server.

In that case, at the very least, there is a lack of a consistent and
coherant policy regarding standard compliance.

> Almost all of our IMAP users mostly use the web interface, so yes,
> they want the IMAP experience to mimic the Gmail interface.  There are
> settings that can be set to make the experience more "IMAP normal"
> than "Gmail normal", but the defaults favor the more common use case.

Do you actually have research and firm numbers to back up the contention
that customers want their IMAP clients to behave incorrectly, including
malfunctioning, so that some IMAP clients seem to mimic Gmail?

Or is this just a matter of religion that has never been challenged, much
less bolstered with research?

> I know its not compliant, we even have a support page where we list
> the cases where we explicitly decided against compliance.
> Interoperability is a goal, however.  That doesn't mean we want to
> force Gmail users to use the IMAP mailbox model, however.

So, instead, you force Gmail users to use a non-compliant model that
violates guarantees in IMAP and causes some IMAP clients to malfunction.

> And I argue that we made the design decisions we did with good
> reasons.  I also argue that there were no perfect decisions to be
> made, and usability was a more important goal than correctness.

Thank you for confirming, in public, that the Gmail IMAP server is
non-compliant and that Google has no intention to make it compliant.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Reply
E-mail headers
From: brong@fastmail.fm
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: 1320099853.21209.140660992923317@webmail.messagingengine.com permalink / raw / eml / mbox
On Monday, October 31, 2011 3:06 PM, "Brandon Long" <blong@google.com> wrote:
> Ie, we
> implemented COMPRESS when we saw that Thunderbird supported it (and
> used their implementation to fix bugs in ours) and then when Apple saw
> we supported it, they added it to iOS (I'm guessing, but that's what
> it looked like).

Wow - I _can_ change the world!

(that's the only part of Thunderbird I've written)

> For us, it would have been very beneficial if the CAPABILITY command
> had the client telling us which extensions they supported, so we could
> better focus our efforts on the extensions most likely to be useful.
> 
> > I'm actually surprised that no serious attempt has been made to create a
> > competing protocol that's simpler to implement, at both ends. ?I suspect
> > Google could do it themselves with a bit of cooperation from Thunderbird.
> 
> I wouldn't claim to be able to create something better.  I could make
> something simpler with a simpler use case.  The argument may be that
> IMAP tries to do too much, that a two-way syncing protocol would be
> simpler and what most clients these days would use in preference to
> the "on-line, off load work to server" model that say pine wants.  Or,
> that could just be my biases coloring my perception.

Yes, a two way syncing protocol would be very nice.  Also some checksum
stuff to allow you to quickly verify the correctness of the your local
copies.  We expose the sha1 of the messages via IMAP, but of course it's
not standard, so nobody else will use it.

The Cyrus replication protocol contains everything required to regenerate
an exactly matching IMAP image on the remote end, and calculates a
checksum over the entire unexpunged state (we store expunged index records
for a while to support QRESYNC) which can be compared between the two ends.
This allows bandwidth efficient replication - along with the sha1 for
duplicate and copy/rename detection.  In the trivial case we just compare
modseq between master and replica, replicate the records with a higher
modseq plus any messages with UID higher than the remote LAST_UID, and
then compare the SYNC_CRC to make sure the mailbox matches afterwards.

Of course this has been complicated by annotations and conversation ID
(cross folder threading) and there's stuff for supporting sieve, non-owner
seen, subscriptions, etc.

But the basic protocol is fairly "imaplike".

The hardest bit was safely syncing files - I wound up creating an extra
literal syntax that looks like this:

%{partition sha1 size}\r\n
BYTES

The 'partition' is a Cyrus abomination, but is required to efficiently
stream the file to a temporary location on the correct filesystem with
a multi partition installation.

The 'sha1' is mainly used to create the spool filename and to do post-sync
integrity checks.  In theory the sender already re-calculated the sha1
and compared it to the index record before sending, but it doesn't hurt
to re-check.  Our goal is to never replicate corrupted data and break the
other end.

You could avoid all this and have the syntax purely IMAP compatible by
having a more stateful parser of course.  At the moment the replication
engine can parse a full statement off the wire and into memory, knowing
it won't be too giant - and then the files will already be on the right
partition to just hardlink into place.

But I'd be quite interested in having a local client which supported that
protocol and could use it to talk to a remote Cyrus instance.  The mailbox
layer code already asserts all the invarients, so you can never inject
a message with a lower UID than the last one used, or a MODSEQ that isn't
greater than the current HIGHESTMODSEQ on the server.  But allowing you
to explicitly set the target values for them would mean you could keep
two ends exactly in sync with IMAP values.

Whereas at the moment, check what offlineimap does.  It can't even rely
on UIDPLUS, so it has to add a special header to every message it
uploads just so it can tell for sure what UID it got without downloading
the whole bloody thing again.  Or at least it did last time I read the
mutlti-threaded python horror that it was.

Bron ( I have nothing against multi-threading or python... at least
       not until signal handling gets involved )
-- 
  Bron Gondwana
  brong@fastmail.fm
Reply
E-mail headers
From: mrc+imap@panda.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: alpine.OSX.2.00.1110311528590.9034@hsinghsing.panda.com permalink / raw / eml / mbox
On Mon, 31 Oct 2011, Brandon Long wrote:
> I could make
> something simpler with a simpler use case.  The argument may be that
> IMAP tries to do too much, that a two-way syncing protocol would be
> simpler and what most clients these days would use in preference to
> the "on-line, off load work to server" model that say pine wants.  Or,
> that could just be my biases coloring my perception.

Port 220 is allocated for people who think that they are smarter than me
in the design of IMAP.

Implement your new Gmap on port 220, and make port 143 comply with my
specification.

You will then have your answer as to how many people prefer your design
over mine.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Reply
E-mail headers
From: dave@cridland.net
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: 3124.1320100892.530453@puncture permalink / raw / eml / mbox
On Mon Oct 31 22:06:47 2011, Brandon Long wrote:
> talking existing extensions, not making up new ones).   Ie, we
> implemented COMPRESS when we saw that Thunderbird supported it (and
> used their implementation to fix bugs in ours) and then when Apple  
> saw
> we supported it, they added it to iOS (I'm guessing, but that's what
> it looked like).

So Thunderbird implemented it because Cyrus did.

Cyrus implemented it because Polymer and M-Box did. (Actually I can't  
recall if M-Box or Cyrus got there first).

And Polymer did because Arnt did in Archiveopteryx.

So actually everyone's just slavishly following Arnt.

So on that principle, if I can get Arnt to implement ACAP, then  
Google will surely follow.

Dave.
-- 
Dave Cridland - mailto:dave@cridland.net - xmpp:dwd@dave.cridland.net
  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
  - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
Reply
E-mail headers
From: brong@fastmail.fm
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:46 -0000
Message-ID: 1320100729.24671.140660992928641@webmail.messagingengine.com permalink / raw / eml / mbox
On Tuesday, November 01, 2011 12:18 AM, "Timo Sirainen" <tss@iki.fi> wrote:
> On 31.10.2011, at 23.53, Brandon Long wrote:
> 
> >> If you think that Gmail's IMAP server is compliant, you are mistaken.
> > 
> > I know its not compliant, we even have a support page where we list
> > the cases where we explicitly decided against compliance.
> > Interoperability is a goal, however.  That doesn't mean we want to
> > force Gmail users to use the IMAP mailbox model, however.
> 
> I haven't tested GMail with imaptest for a while now (few years?), but I don't think the bugs it reports are about the mailbox model, or anything else that can't be fixed somewhat easily. I guess the one test I could remove is the search for substrings, since even Mark doesn't think it's worth the trouble nowadays.

I've nearly got 'fetchnext' working on all the database engines I care about,
so I can efficiently implement a proper handling of the LIST-EXT edge cases.

------------

Not that anyone much cares about those.  Most clients just want LIST "" * to work
fast rather than building up a tree with a zillion round trips - and for it to 
return the XLIST/SPECIAL-USE and subscription status along with the result,
they would be ecstatic.  If you could rely on that, then most clients would
just call the one call and be happy.

Oh, and I'm sure for bandwidth usage reasons, those same clients would also
love it to return a MODSEQ, which they could offer along with a IFCHANGEDSINCE
which returned the moral equivalent of a 304.

Or even better, a response to login which included a separate MODSEQ value
for those rarely changed things, so every client login included a "your
cached values for the following things are already fine, you don't need to
ask again".

Man, wouldn't that be nice.

For that matter, you could go the step further we're heading to with our
in-house Cyrus builds now, where there's a global HIGHESTMODSEQ and a global
UIDVALIDITY for all one user's folders.  Along with UIDVALIDITY change on
rename, you only need to read that ONE value to be sure that no folders
have been changed.

(yes, we do bump it on DELETE as well, a nice side affect of DELETE just
being a rename into a hidden namespace, where it lounges for a week before
being properly cleaned up)

And you can just check the one HIGHESTMODSEQ value, to be sure that nothing
has changed in ANY folder.  That's a great first-level check.  After that
you need to also check the modseqs on any folders to drill down to where the
change may have occurred - but they're in the statuscache, so you can fetch
that information for the cost of a single linear read.  Only the actually
changed folder need be opened.

Along with event push to clients (out of band) just saying "something's
changed, do a poll" we can get fast, efficient updates for all folders.
The intermediate layer is JSON, but hey - that keeps the browser client
happy, and everything else can talk it too.  Someone did a benchmark of
a pile of Perl serialisation options, and the JSON::XS won - even over
the native Storable, so it's no slouch either.

Bron.
-- 
  Bron Gondwana
  brong@fastmail.fm
Reply
E-mail headers
From: mrc+imap@panda.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: alpine.OSX.2.00.1110311532190.9034@hsinghsing.panda.com permalink / raw / eml / mbox
On Tue, 1 Nov 2011, Timo Sirainen wrote:
> I haven't tested GMail with imaptest for a while now (few years?), but I
> don't think the bugs it reports are about the mailbox model, or anything
> else that can't be fixed somewhat easily.

Sigh, Timo, you had to give away the secret. I was hoping to bait this
trap a bit longer.

Since the cat is out of the bag: there is nothing in the mailbox model
which does not comply with IMAP, or which can not be presented in an
IMAP-compliant manner. IMAP is silent on the mailbox model for a reason.

The problems with Gmail's server are, by and large, flat out bugs. There
are a few design faults; but they are completely unnecessary and could be
amended while keeping the Gmail model.

> I guess the one test I could
> remove is the search for substrings, since even Mark doesn't think it's
> worth the trouble nowadays.

FWIW, I always considered the implementation of string searching to be
implementation dependent. The only requirement was that simple
case-independent substring is recognized as compliant. I never intended
that a server be forbidden from implementing a smarter search (e.g.,
fuzzy) that returns more matching messages.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Reply
E-mail headers
From: blong@google.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: CABa8R6vJGTa+VH-TOgXhCSpX5N1HZTBBqisDhPY9Metz5864JQ@mail.gmail.com permalink / raw / eml / mbox
On Mon, Oct 31, 2011 at 3:28 PM, Mark Crispin <mrc+imap@panda.com> wrote:
> On Mon, 31 Oct 2011, Brandon Long wrote:
>>
>> I have no idea who you talked to, but my team owns the Gmail IMAP server.
>
> In that case, at the very least, there is a lack of a consistent and
> coherant policy regarding standard compliance.
>
>> Almost all of our IMAP users mostly use the web interface, so yes,
>> they want the IMAP experience to mimic the Gmail interface. ?There are
>> settings that can be set to make the experience more "IMAP normal"
>> than "Gmail normal", but the defaults favor the more common use case.
>
> Do you actually have research and firm numbers to back up the contention
> that customers want their IMAP clients to behave incorrectly, including
> malfunctioning, so that some IMAP clients seem to mimic Gmail?

When we first released IMAP internally, the overwhelming use case was
as an adjunct to Gmail Web interface, not as a replacement to the
client.  That is expected, of course.  But our public usage mimic'd
this to nearly the extreme.  Now, I'm willing to believe that our
choices informed the usage in some part, but the overwhelming use case
of Gmail's IMAP interface is access on mobile devices by people who
use the Gmail Web interface on the desktop.

> Or is this just a matter of religion that has never been challenged, much
> less bolstered with research?

I know the clients which use my service, since that knowledge is
important to me, even if the ID command is considered pariah to the
perfectness of the protocol.

Also see the changes that Thunderbird has made to make their client
more "Gmail" like when talking to us.  Or see the Blackberry Enhanced
Gmail plug-in which tries to make the Blackberry have a more Gmail
feel.  Or arguments I've gotten in on Google+ with our adoring public
who wanted us to make the usage even more Gmail like, impossible
though that was to express in the IMAP protocol.

>> I know its not compliant, we even have a support page where we list
>> the cases where we explicitly decided against compliance.
>> Interoperability is a goal, however. ?That doesn't mean we want to
>> force Gmail users to use the IMAP mailbox model, however.
>
> So, instead, you force Gmail users to use a non-compliant model that
> violates guarantees in IMAP and causes some IMAP clients to malfunction.

If we are violating guarantees that cause malfunctions, it would be
good to know that.  The only one I know of is that under some
circumstances, alpine wants some combination of the header + body to
equal something, and due to our LF->CRLF shenanigans, that can be
violated.  I'm not happy with it, but I have no good way to fix it,
either.  Whether its an important guarantee, well, it hasn't seemed to
have affected anyone who removed the check.

I know also that some of our deletion behavior follows some of the
"non-suggested" ones from the IMAP best practices RFC, but I didn't
think doing those was actually non-compliant.

>> And I argue that we made the design decisions we did with good
>> reasons. ?I also argue that there were no perfect decisions to be
>> made, and usability was a more important goal than correctness.
>
> Thank you for confirming, in public, that the Gmail IMAP server is
> non-compliant and that Google has no intention to make it compliant.

I feel pretty safe in saying that Gmail's IMAP doesn't support
substring search of bodies and never will, and is therefore
non-compliant.  At least until we can do it for a single user without
consuming the equivalent resources of a couple million users who don't
use it.

Brandon
-- 
?Brandon Long <blong@google.com>
?Staff Engineer
?Gmail Delivery TLM
Reply
E-mail headers
From: brong@fastmail.fm
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: 1320102205.30039.140660992940805@webmail.messagingengine.com permalink / raw / eml / mbox
On Monday, October 31, 2011 10:41 PM, "Dave Cridland" <dave@cridland.net> wrote:
> On Mon Oct 31 22:06:47 2011, Brandon Long wrote:
> > talking existing extensions, not making up new ones).   Ie, we
> > implemented COMPRESS when we saw that Thunderbird supported it (and
> > used their implementation to fix bugs in ours) and then when Apple  
> > saw
> > we supported it, they added it to iOS (I'm guessing, but that's what
> > it looked like).
> 
> So Thunderbird implemented it because Cyrus did.

Basically, yes.  And I think Cyrus implemented it mainly for inter-Cyrus
links to make murder more efficient between campuses - and then I wanted
it for replication as well.

> Cyrus implemented it because Polymer and M-Box did. (Actually I can't  
> recall if M-Box or Cyrus got there first).
> 
> And Polymer did because Arnt did in Archiveopteryx.
> 
> So actually everyone's just slavishly following Arnt.
> 
> So on that principle, if I can get Arnt to implement ACAP, then  
> Google will surely follow.

Go for it!

Bron.
-- 
  Bron Gondwana
  brong@fastmail.fm
Reply
E-mail headers
From: brong@fastmail.fm
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: 1320102709.31835.140660992941501@webmail.messagingengine.com permalink / raw / eml / mbox
On Monday, October 31, 2011 3:49 PM, "Mark Crispin" <mrc+imap@panda.com> wrote:
> On Tue, 1 Nov 2011, Timo Sirainen wrote:
> > I haven't tested GMail with imaptest for a while now (few years?), but I
> > don't think the bugs it reports are about the mailbox model, or anything
> > else that can't be fixed somewhat easily.
> 
> Sigh, Timo, you had to give away the secret. I was hoping to bait this
> trap a bit longer.

And that's why I think you're a wanker - because I don't want to bait anybody.
I want a better mail experience for users, and I suspect Brendan does too,
within the constraints he's working with.

> Since the cat is out of the bag: there is nothing in the mailbox model
> which does not comply with IMAP, or which can not be presented in an
> IMAP-compliant manner. IMAP is silent on the mailbox model for a reason.

It's silent in the way that XSLT is silent on the implementation language.
Not explicitly mentioned, but with heaps of design decisions which show
a clear background with 

> The problems with Gmail's server are, by and large, flat out bugs. There
> are a few design faults; but they are completely unnecessary and could be
> amended while keeping the Gmail model.

I suspect we could, with a bit of productive discussion, get Brendan to
fix some of these.  It's gone midnight here, but I'm going to see if I
can dig out some details on how Cyrus handles literal coding and talk to
Brendan directly about how hard that would be to add to gmail and fix
the 8bit quoted values issue.  Particularly if we can do it without
melting any more polar caps than required.

> > I guess the one test I could
> > remove is the search for substrings, since even Mark doesn't think it's
> > worth the trouble nowadays.
> 
> FWIW, I always considered the implementation of string searching to be
> implementation dependent. The only requirement was that simple
> case-independent substring is recognized as compliant. I never intended
> that a server be forbidden from implementing a smarter search (e.g.,
> fuzzy) that returns more matching messages.

Interesting.  That makes me feel a bit more comfortable about ignoring the
full RFC 5051 i;unicode-casemap on searches, and only using it for sort.

Thanks.

I was leaning that way anyway, but it means I can make the Cyrus
implementation fully 5051 complient without user visible changes.

I've already switched to storing the parsed UTF-8 values into the cache file
rather than search-normalised forms, so I can choose the sort algorithm to
apply later.  This will come in handy if we try to put other collation
algorithm support in later.

Bron.
-- 
  Bron Gondwana
  brong@fastmail.fm
Reply
E-mail headers
From: mrc+imap@panda.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: alpine.OSX.2.00.1110311641160.9034@hsinghsing.panda.com permalink / raw / eml / mbox
On Mon, 31 Oct 2011, Brandon Long wrote:
> When we first released IMAP internally, the overwhelming use case was
> as an adjunct to Gmail Web interface, not as a replacement to the
> client.  That is expected, of course.  But our public usage mimic'd
> this to nearly the extreme.  Now, I'm willing to believe that our
> choices informed the usage in some part, but the overwhelming use case
> of Gmail's IMAP interface is access on mobile devices by people who
> use the Gmail Web interface on the desktop.

That does NOT bolster the claim that the public wants a non-compliant
IMAP implementation that may break their IMAP client.

My experience is that the overwhelming majority of users want their tools
to work. When an IMAP client does not work with Gmail, they will complain
to the IMAP client vendor, not to Google.

> Or arguments I've gotten in on Google+ with our adoring public
> who wanted us to make the usage even more Gmail like, impossible
> though that was to express in the IMAP protocol.

I double that Google+ is representative of the community. Yes, I have a
Google+ account, but after a few times I found it to be pointless (mostly
filled with people desperately trying to convince themselves and others
that it is better than FB) and stopped using it. It's not an FB-killer.

> If we are violating guarantees that cause malfunctions, it would be
> good to know that.

Gmail's IMAP server does so.

These complaints are ongoing and easy to find. The usual answer is always
"Gmail doesn't work well as an IMAP server, don't try to use it as such;
instead use fetchmail to download to a local file and process it locally."

I suggest that you collect these complaints, and one by one fix them.

I did the exercise once upon a time, and came up with a few dozen discrete
issues that was delivered to Google. I don't have list any more. Given my
experience the last time, I will not repeat the effort. You had the
opportunity, you squandered it, not my problem.

It would also be a good idea to contact each of the complainers and ask
for confirmation that the problem is abolished. That's what Google would
do if it cared. I remain unconvinced.

The only use that I have for Gmail is receipts from the Android market.

Speaking of Android, I suppose you know that the Android Mail app is
almost completely useless (embarassingly bad), and that Android users who
do anything but the most casual email must download a third party client
(none of which are particularly good, but at least are usable). Perhaps
you don't care because Android has a separate app for Gmail.

It sure does look like the Microsoft monopolistic behavior; pretend to use
open standards but do things so that other vendors' products do not work
well. So much for "don't be evil".

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Reply
E-mail headers
From: tss@iki.fi
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: D96D7BAA-894A-41DC-8349-752E737C7A3B@iki.fi permalink / raw / eml / mbox
On 1.11.2011, at 1.14, Brandon Long wrote:

> If we are violating guarantees that cause malfunctions, it would be
> good to know that.  The only one I know of is that under some
> circumstances, alpine wants some combination of the header + body to
> equal something, and due to our LF->CRLF shenanigans, that can be
> violated.  I'm not happy with it, but I have no good way to fix it,
> either.  Whether its an important guarantee, well, it hasn't seemed to
> have affected anyone who removed the check.

Dovecot also stores messages with LFs and has no trouble exporting them as if they were CRLFs. I think the only actual (performance) problem with it is .. well, actually the topic of this thread :) A partial fetch from a non-zero offset requires some scanning to find out the LF-only-offset. But luckily all clients just fetch the blocks in increasing order from zero offset, so this isn't such an important problem.
Reply
E-mail headers
From: brong@fastmail.fm
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: 1320135545.26476.140660993079121@webmail.messagingengine.com permalink / raw / eml / mbox
On Tuesday, November 01, 2011 12:11 AM, "Bron Gondwana" <brong@fastmail.fm> wrote:
> and I suspect Brendan does too,
> within the constraints he's working with.

Apologies Brandon - I meant to double check that I'd remember the name
correctly before I sent the email - then I forgot to go back.

I'll try not to mess it up again!

Bron ( besides, I don't want open season on MY name ;)
-- 
  Bron Gondwana
  brong@fastmail.fm
Reply
E-mail headers
From: mrc+imap@panda.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: alpine.OSX.2.00.1110311633050.9034@hsinghsing.panda.com permalink / raw / eml / mbox
On Tue, 1 Nov 2011, Bron Gondwana wrote:
>> FWIW, I always considered the implementation of string searching to be
>> implementation dependent. The only requirement was that simple
>> case-independent substring is recognized as compliant. I never intended
>> that a server be forbidden from implementing a smarter search (e.g.,
>> fuzzy) that returns more matching messages.
> Interesting.  That makes me feel a bit more comfortable about ignoring the
> full RFC 5051 i;unicode-casemap on searches, and only using it for sort.

You can not do that and claim compliance with RFC 5255.

RFC 5255 explicitly requires that you apply i;unicode-casemap in searches
as part of level 1 compliance.

If you do not claim RFC 5255 compliance then there is no particular reason
to implement i;unicode-casemap at all.

However, as far as I know, however, Cyrus always implemented a variant of
it from its inception; so you would actually be removing an aspect of it
that was in there from the onset. That is probably not a good idea.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Reply
E-mail headers
From: mrc+imap@panda.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: alpine.OSX.2.00.1110312320370.9034@hsinghsing.panda.com permalink / raw / eml / mbox
On Tue, 1 Nov 2011, Timo Sirainen wrote:
> A partial fetch
> from a non-zero offset requires some scanning to find out the
> LF-only-offset. But luckily all clients just fetch the blocks in
> increasing order from zero offset, so this isn't such an important
> problem.

In legacy stores that use LF-only newlines, I convert a body part to be
fetched to its CRLF form prior to output, and keep it in a buffer that is
reused when the fetched body part changes. When a message is parsed, both
the physical offset/size on disk and the IMAP count is calculated and
preserved for the duration of the session.

In the case of partial fetching, the conversion only happens once, since
as Timo notes all clients just fetch the blocks in increasing order from 0
offset. Each subsequent block fetch notices that the buffer already has
the desired body part, and thus no new mail store read/conversion is
needed.

Modern (post-1995) stores use CRLF newlines, or more accurately do no
newline conversion; they store the message exactly as it was transmitted
in SMTP; thus the physical size and IMAP count are the same. The CPU
saving in abolishing newline conversion is easily worth the storage cost.

Even more modern stores store all parse state, including offsets and
counts, in metadata so that once calculated it is never calculated again
(another great CPU savings).

Bad counts are a problem in partial fetching. The client needs correct
counts in order to form requests to fetch the right amount of data.
Otherwise, the client may fail to fetch the entire data (short count), or
get hung trying to fetch unavailable data (long count).

The check in Alpine that you refer to reports a long count. There's no way
for any IMAP client to detect a short count; it will just fail to fetch
the entire segment. So by advocating removing the check, you are asserting
that Gmail only does long counts, never short counts; and that all other
broken servers will be the same. Thus, you imply that users should not be
warned about servers whose developers are lazy and incompetent. And, when
the body part is truncated, you claim "it must be a client bug".

Most other server developers get it right. Why can't Google?

Perhaps we need product liability for software vendors after all.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Reply
E-mail headers
From: brong@fastmail.fm
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: 20111101063140.GA5353@brong.net permalink / raw / eml / mbox
On Tue, Nov 01, 2011 at 08:06:29AM +0200, Timo Sirainen wrote:
> Dovecot also stores messages with LFs and has no trouble exporting them as if they were CRLFs. I think the only actual (performance) problem with it is .. well, actually the topic of this thread :) A partial fetch from a non-zero offset requires some scanning to find out the LF-only-offset. But luckily all clients just fetch the blocks in increasing order from zero offset, so this isn't such an important problem.

How do you handle a message with a mix of LF and CRLF in the original?

Bron.
Reply
E-mail headers
From: brong@fastmail.fm
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: 1320148200.20407.140660993144221@webmail.messagingengine.com permalink / raw / eml / mbox
On Monday, October 31, 2011 4:37 PM, "Mark Crispin" <mrc+imap@panda.com> wrote:
> On Tue, 1 Nov 2011, Bron Gondwana wrote:
> >> FWIW, I always considered the implementation of string searching to be
> >> implementation dependent. The only requirement was that simple
> >> case-independent substring is recognized as compliant. I never intended
> >> that a server be forbidden from implementing a smarter search (e.g.,
> >> fuzzy) that returns more matching messages.
> > Interesting.  That makes me feel a bit more comfortable about ignoring the
> > full RFC 5051 i;unicode-casemap on searches, and only using it for sort.
> 
> You can not do that and claim compliance with RFC 5255.

I'll probably make it a skanky toggle then.

> RFC 5255 explicitly requires that you apply i;unicode-casemap in searches
> as part of level 1 compliance.

The response when I mentioned it to our project manager was "it's often nice
not to worry about a vs ? when searching - and have it find both".

But that's fuzzy-matching for you.  Next thing it will be soundex searches,
and then we wind up with google style "magic".

> If you do not claim RFC 5255 compliance then there is no particular reason
> to implement i;unicode-casemap at all.
> 
> However, as far as I know, however, Cyrus always implemented a variant of
> it from its inception; so you would actually be removing an aspect of it
> that was in there from the onset. That is probably not a good idea.

I'd rather not break our users' expectations too fast.  People don't like
change much.  Hence the toggle.  Probably with the bogus default too.  Maybe
"rfc5255_strict_search: no" or something.  If you don't turn it on, then an
extra dicritical stripping pass and lowercasing pass gets run on the final
rfc5255 compatible data before searching it.

Bron.
-- 
  Bron Gondwana
  brong@fastmail.fm
Reply
E-mail headers
From: tss@iki.fi
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: 19901681-2C1A-410F-93EE-51F2AF707382@iki.fi permalink / raw / eml / mbox
On 1.11.2011, at 8.31, Bron Gondwana wrote:

> On Tue, Nov 01, 2011 at 08:06:29AM +0200, Timo Sirainen wrote:
>> Dovecot also stores messages with LFs and has no trouble exporting them as if they were CRLFs. I think the only actual (performance) problem with it is .. well, actually the topic of this thread :) A partial fetch from a non-zero offset requires some scanning to find out the LF-only-offset. But luckily all clients just fetch the blocks in increasing order from zero offset, so this isn't such an important problem.
> 
> How do you handle a message with a mix of LF and CRLF in the original?

"Correctly." :)

Basically everywhere there are message (part) sizes, I store the "physical size" (exactly as it is stored in disk, with or without CRs) and the "virtual size" (all LFs converted to CRLFs). If physical size equals to virtual size, I'll do some extra optimizations like being able to seek to wanted offset immediately or use sendfile() to send the message.

Although a mix of LFs and CRLFs in the same message shouldn't normally appear in mail files. Whenever saving messages via Dovecot all of the newlines are changed to either LFs or CRLFs (mail_save_crlf setting, or forced sometimes by a storage backend).
Reply
E-mail headers
From: mrc+imap@panda.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: alpine.OSX.2.00.1111010850250.9034@hsinghsing.panda.com permalink / raw / eml / mbox
On Tue, 1 Nov 2011, Bron Gondwana wrote:
>> RFC 5255 explicitly requires that you apply i;unicode-casemap in searches
>> as part of level 1 compliance.
> The response when I mentioned it to our project manager was "it's often nice
> not to worry about a vs ? when searching - and have it find both".

Careful.

i;unicode-casemap is designed to be a simple collator/comparator that even
a baby programmer can implement correctly. It is not intended to be
something that people can fork off all sorts of random non-interoperable
variants.

It also formalized, and moderately amended, what Cyrus has done from its
inception in searching Unicode strings.

You will probably need to define a different comparator for that purpose
(e.g., i;unicode-casemap-ignore-diacriticals). Beyond that, you will
quickly find yourself in a swamp filled with alligators (or crocodiles if
you prefer). Even the modest step of an "ignore-diacriticals" comparator
will get you wet above the knee.

If you want to get into the type of matching you are talking about, you
will wind up needing to do a full-fledged implementation of i18n collation
and comparison, which more likely that not includes locale sensitivity.
This is not something to be half-assed or hackish on. There are standards
and rules; and in some cases these are enforced in national laws.

I strongly urge you, BEFORE embarking upon such a project, to get involved
with the various groups involved with i18n collation and comparison and
seek their advice.

I did not do i;unicode-casemap in a vacuum; I sought their advice and
after their screams of anguished horror, these guys gave good advice which
I took serious and acted upon. One of the things that was important to
them was that, while (reluctantly) accepting the "we need something that
even a baby programmer can implement", they wanted to draw the line and
say "do this, or do it right."

With this said, I don't particularly object to ignore-diacriticals
searching; but I also note that the concept is locale-dependent. In some
languages, the diacritical form indicates accent or sound; in others it is
a completely unrelated character (and the latter group already is
infuritated by i;unicode-casemap).

CJK is another part of the swamp.  For example, U+5FB0 ? and U+5FB7 ?
are fundamentally the same character; they have the same meaning and
differ only by an added stroke in the Chinese/Korean form that the
Japanese form lacks. Yet at least one Chinese character set has both
forms. Adult CJK native speakers would say that the two should match in
search; and many would have to have that one stroke difference pointed out
to them before they'd notice it.

But that's just a simple case. CJK is full of these, and most are far more
complicated. There are lots of cases where the equivalency is one way;
that is, A is equivalent to B, but B is NOT equivalent to A (or worse is
SOMETIMES equivalent to A). At this point, the swamp reptiles are over
your head.

The bottom line is that, whatever you do, seek the advice of the language
folks. Your implementation will have to be tempered by realism; but at
least you can avoid a mistake. Undoing a mistake is far more costly.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
Reply
E-mail headers
From: brong@fastmail.fm
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: 20111101070031.GA6591@brong.net permalink / raw / eml / mbox
On Tue, Nov 01, 2011 at 08:40:57AM +0200, Timo Sirainen wrote:
> On 1.11.2011, at 8.31, Bron Gondwana wrote:
> 
> > On Tue, Nov 01, 2011 at 08:06:29AM +0200, Timo Sirainen wrote:
> >> Dovecot also stores messages with LFs and has no trouble exporting them as if they were CRLFs. I think the only actual (performance) problem with it is .. well, actually the topic of this thread :) A partial fetch from a non-zero offset requires some scanning to find out the LF-only-offset. But luckily all clients just fetch the blocks in increasing order from zero offset, so this isn't such an important problem.
> > 
> > How do you handle a message with a mix of LF and CRLF in the original?
> 
> "Correctly." :)

Er - by which you mean that you always return the exact bytes you were given?

> Basically everywhere there are message (part) sizes, I store the "physical size" (exactly as it is stored in disk, with or without CRs) and the "virtual size" (all LFs converted to CRLFs). If physical size equals to virtual size, I'll do some extra optimizations like being able to seek to wanted offset immediately or use sendfile() to send the message.

Sounds to me like that's enough benefit to store it all CRLFs in itself.
1/65 of storage space vs seek and sendfile.

> Although a mix of LFs and CRLFs in the same message shouldn't normally appear in mail files.

Most often seen with headers, or between parts.  The most ugly cases
being differences between the mime-headers of a part, and the content
of said part.

Bron.
Reply
E-mail headers
From: brong@fastmail.fm
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: 20111101165159.GA22321@brong.net permalink / raw / eml / mbox
On Tue, Nov 01, 2011 at 09:33:40AM -0700, Mark Crispin wrote:
> On Tue, 1 Nov 2011, Bron Gondwana wrote:
> >>RFC 5255 explicitly requires that you apply i;unicode-casemap in searches
> >>as part of level 1 compliance.
> >The response when I mentioned it to our project manager was "it's often nice
> >not to worry about a vs ? when searching - and have it find both".
> 
> Careful.

yeah, whatever.  Basically, this is a "implement something that behaves
the same as what was there before".  I'll throw a bunch of test cases at
the two sets of code and make sure they work the same.

> It also formalized, and moderately amended, what Cyrus has done from its
> inception in searching Unicode strings.

No - Cyrus had this:

https://github.com/brong/cyrus-imapd/commit/d988fd9f1fc9a3ca4a0f453e188f2821e050af33#diff-1

(still does actually, that's my rfc5051 branch, which isn't integrated
to mainline)

That's what I'm trying to maintain compatibility with, so users don't
see any difference on existing installs.

> If you want to get into the type of matching you are talking about, you
> will wind up needing to do a full-fledged implementation of i18n collation
> and comparison, which more likely that not includes locale sensitivity.
> This is not something to be half-assed or hackish on. There are standards
> and rules; and in some cases these are enforced in national laws.

I would love to do that - but one thing at a time.

> I strongly urge you, BEFORE embarking upon such a project, to get involved
> with the various groups involved with i18n collation and comparison and
> seek their advice.

Definitely.  It might not even be me doing this.  I just want to make
sure that the design leaves the possibility open.

Bron.
Reply
E-mail headers
From: tss@iki.fi
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: 6F93E5CD-2792-4D05-BEEF-F1682CA88C89@iki.fi permalink / raw / eml / mbox
On 1.11.2011, at 9.00, Bron Gondwana wrote:

> On Tue, Nov 01, 2011 at 08:40:57AM +0200, Timo Sirainen wrote:
>> On 1.11.2011, at 8.31, Bron Gondwana wrote:
>> 
>>> On Tue, Nov 01, 2011 at 08:06:29AM +0200, Timo Sirainen wrote:
>>>> Dovecot also stores messages with LFs and has no trouble exporting them as if they were CRLFs. I think the only actual (performance) problem with it is .. well, actually the topic of this thread :) A partial fetch from a non-zero offset requires some scanning to find out the LF-only-offset. But luckily all clients just fetch the blocks in increasing order from zero offset, so this isn't such an important problem.
>>> 
>>> How do you handle a message with a mix of LF and CRLF in the original?
>> 
>> "Correctly." :)
> 
> Er - by which you mean that you always return the exact bytes you were given?

I don't think LF vs. CRLF have any special meaning in email data, they're both simply newlines. So Dovecot doesn't try to preserve them. They're both converted to newlines anyway (LFs or CRLFs depending on context). Although I did initially wonder about supporting binary message bodies, but never bothered with it.

>> Basically everywhere there are message (part) sizes, I store the "physical size" (exactly as it is stored in disk, with or without CRs) and the "virtual size" (all LFs converted to CRLFs). If physical size equals to virtual size, I'll do some extra optimizations like being able to seek to wanted offset immediately or use sendfile() to send the message.
> 
> Sounds to me like that's enough benefit to store it all CRLFs in itself.
> 1/65 of storage space vs seek and sendfile.

Well, that's why it's an option :) But typically I've noticed that I/O is the problem, not CPU, so sendfile isn't all that useful. The seeking is more of a theoretical problem. Normally when clients fetch partial data they start from offset 0, so no seeking needed. The next block starts from where the previous block ended, which Dovecot remembers and continues again without seeking. And so on. So even if LFs save only a little disk space and disk I/O, I figured it's better than nothing.

>> Although a mix of LFs and CRLFs in the same message shouldn't normally appear in mail files.
> 
> Most often seen with headers, or between parts.  The most ugly cases
> being differences between the mime-headers of a part, and the content
> of said part.

Coming from where? SMTP? IMAP APPENDs? I've never noticed, because Dovecot handles them silently.
Reply
E-mail headers
From: brong@fastmail.fm
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:47 -0000
Message-ID: 1320134176.22414.140660993072841@webmail.messagingengine.com permalink / raw / eml / mbox
On Tuesday, November 01, 2011 9:26 AM, "Timo Sirainen" <tss@iki.fi> wrote:
> On 1.11.2011, at 9.00, Bron Gondwana wrote:
> 
> > On Tue, Nov 01, 2011 at 08:40:57AM +0200, Timo Sirainen wrote:
> >> On 1.11.2011, at 8.31, Bron Gondwana wrote:
> >> 
> >>> On Tue, Nov 01, 2011 at 08:06:29AM +0200, Timo Sirainen wrote:
> >>>> Dovecot also stores messages with LFs and has no trouble exporting them as if they were CRLFs. I think the only actual (performance) problem with it is .. well, actually the topic of this thread :) A partial fetch from a non-zero offset requires some scanning to find out the LF-only-offset. But luckily all clients just fetch the blocks in increasing order from zero offset, so this isn't such an important problem.
> >>> 
> >>> How do you handle a message with a mix of LF and CRLF in the original?
> >> 
> >> "Correctly." :)
> > 
> > Er - by which you mean that you always return the exact bytes you were given?
> 
> I don't think LF vs. CRLF have any special meaning in email data, they're both simply newlines. So Dovecot doesn't try to preserve them. They're both converted to newlines anyway (LFs or CRLFs depending on context). Although I did initially wonder about supporting binary message bodies, but never bothered with it.

The real issue is things which do checksums on the email contents.
Digital signatures mainly.  Luckily mostly the clients that
actually bother with digital signatures also bother with getting
the other parts right.

> >> Basically everywhere there are message (part) sizes, I store the "physical size" (exactly as it is stored in disk, with or without CRs) and the "virtual size" (all LFs converted to CRLFs). If physical size equals to virtual size, I'll do some extra optimizations like being able to seek to wanted offset immediately or use sendfile() to send the message.
> > 
> > Sounds to me like that's enough benefit to store it all CRLFs in itself.
> > 1/65 of storage space vs seek and sendfile.
> 
> Well, that's why it's an option :) But typically I've noticed that I/O is the problem, not CPU, so sendfile isn't all that useful. The seeking is more of a theoretical problem. Normally when clients fetch partial data they start from offset 0, so no seeking needed. The next block starts from where the previous block ended, which Dovecot remembers and continues again without seeking. And so on. So even if LFs save only a little disk space and disk I/O, I figured it's better than nothing.

Agree - IO is our biggest issue by far.  Of course, we're not google.  We throw dual CPUs
in our IMAP boxes just because you need two CPUs to drive 48Gb of RAM happily, and that's
our current sweet spot US$13k machines with two SSDs, 12 2Tb SATA hard disks and 48Gb
RAM with a pair of low-end CPUs.  Along with battery backed RAID controllers to take the
edge off the slow disks, it works pretty well.

Certainly anything which only hits the index files is blindingly fast!  But if I was going
to store somehow cleverly to save disk space, it would be zlib with a dose of pre-optimised
dictionary.  We already do that for our backups - they're tar.gz files.  I wrote a talk a
little while back about how we use a pure-perl library which can repack tar files in a
single streaming read/write to get good compression over time.

> >> Although a mix of LFs and CRLFs in the same message shouldn't normally appear in mail files.
> > 
> > Most often seen with headers, or between parts.  The most ugly cases
> > being differences between the mime-headers of a part, and the content
> > of said part.
> 
> Coming from where? SMTP? IMAP APPENDs? I've never noticed, because Dovecot handles them silently.

Most rubbish comes in via SMTP - we handle it with a bunch of cleanups in our LMTP proxy
before Cyrus sees it - but it's amazing what IMAP clients will try to give you too!

Bron.
-- 
  Bron Gondwana
  brong@fastmail.fm
Reply