mailing list archives

meli community discussions

⚠️ if something does not work as intended when interracting with the mailing lists,
reach out Github mirror Gitea repo @epilys:matrix.org

E-mail headers
From: David Harris <David.Harris@pmail.gen.nz>
To: imap-protocol@u.washington.edu
Date: Fri, 08 Jun 2018 12:34:54 -0000
Message-ID: 55246A71.27553.323D151D@David.Harris.pmail.gen.nz permalink / raw / eml / mbox
I'm in the process of completely rewriting the SEARCH logic in my IMAP server - 
the old code was done in a hurry and was, quite frankly, ridiculously bad, but that's 
another story.

As I get into testing cases, I've come across a number of areas where RFC3501 
and  the various sub-documents that I know about are... uh, "vague". I'd like to get a 
take on how other implementors view them.

1: BODY:  When a SEARCH BODY expression is issued, how should "BODY" be 
interpreted? Is there an assumption that the server should choose the best 
candidate for a displayable message body, parse and normalize it, then search 
that? Or should it simply be taken as a raw scan of the message? How much 
unarmouring and character set normalization is assumed?

2: Headers: when any of the header search expressions is issued, is the 
assumption that the raw header should be searched, or should RFC2047 
encoded-words be reduced and normalized before attempting the comparison?

3: The following search is valid, according to the syntax in RFC3501:

   xx SEARCH OR OR <exp1> <exp2> <exp3>

and allows an OR expression to cover three terms instead of just two. As such, it 
seems quite useful, but it would certainly have mystified my old search code (it was 
rubbish, as I've pointed out), and I was wondering how generally safe it would be to 
use this type of expression?

4: I'm pretty sure I'm right on this one, but the following expression:

   xx SEARCH OR (<exp1> <exp2> <exp3>) exp4

will only result in a match if either <exp4> is a match, or ALL of <exp1>, <exp2> 
and <exp3> are a match. Could someone wiser than me confirm this? I'm 
assuming there is no way to perform a search with a long list of OR conditions 
without doing a lot of calisthenics on the search string (multiple OR conditions 
strung together).

I apologize if any of these are dealt with in RFCs outside RFC3501 - I struggle to 
keep track of all the various sub-documents relating to the protocol these days.

Thanks in advance for any advice.

Cheers!

-- David --

------------------ David Harris -+- Pegasus Mail ----------------------
Box 5451, Dunedin, New Zealand | e-mail: David.Harris@pmail.gen.nz
           Phone: +64 3 453-6880 | Fax: +64 3 453-6612

Schoolboy howler for the day:
   "A census taker is the man who goes from home to home
    increasing the population."
Reply
E-mail headers
From: David.Harris@pmail.gen.nz
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:54 -0000
Message-ID: 55246EE9.10677.324E8888@David.Harris.pmail.gen.nz permalink / raw / eml / mbox
On 8 Apr 2015 at 11:38, David Harris wrote:

> 3: The following search is valid, according to the syntax in RFC3501:
> 
>    xx SEARCH OR OR <exp1> <exp2> <exp3>

Just to clarify: I know that this could have been written as

   xx SEARCH OR <exp1> OR <exp2> <exp3>

and that that would look more reasonable, but the first syntax is also correct 
according to the RFC, and I was wondering if anyone had any sense of whether 
there were servers out there (like my old code) that would choke on it.

Cheers!

-- David --

------------------ David Harris -+- Pegasus Mail ----------------------
Box 5451, Dunedin, New Zealand | e-mail: David.Harris@pmail.gen.nz
           Phone: +64 3 453-6880 | Fax: +64 3 453-6612

Thought for the day:
    Intuition (n): an uncanny sixth sense which tells people 
    that they are right, whether they are or not.
Reply
E-mail headers
From: brong@fastmail.fm
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:54 -0000
Message-ID: 1428453569.748713.250513633.3671D3CF@webmail.messagingengine.com permalink / raw / eml / mbox
On Wed, Apr 8, 2015, at 09:38 AM, David Harris wrote:
> I'm in the process of completely rewriting the SEARCH logic in my IMAP server - 
> the old code was done in a hurry and was, quite frankly, ridiculously bad, but that's 
> another story.
> 
> As I get into testing cases, I've come across a number of areas where RFC3501 
> and  the various sub-documents that I know about are... uh, "vague". I'd like to get a 
> take on how other implementors view them.
> 
> 1: BODY:  When a SEARCH BODY expression is issued, how should "BODY" be 
> interpreted? Is there an assumption that the server should choose the best 
> candidate for a displayable message body, parse and normalize it, then search 
> that? Or should it simply be taken as a raw scan of the message? How much 
> unarmouring and character set normalization is assumed?

Cyrus streams each body part through decoding (qp/base64) and charset handing
(generates a stream of int32 unicode codepoints) - which then feeds into the search
engine to look for matches.  If any part matches, then the message matches.

> 2: Headers: when any of the header search expressions is issued, is the 
> assumption that the raw header should be searched, or should RFC2047 
> encoded-words be reduced and normalized before attempting the comparison?

Likewise - there's a header parser which generates the unicode points for search.

> 3: The following search is valid, according to the syntax in RFC3501:
> 
>    xx SEARCH OR OR <exp1> <exp2> <exp3>
> 
> and allows an OR expression to cover three terms instead of just two. As such, it 
> seems quite useful, but it would certainly have mystified my old search code (it was 
> rubbish, as I've pointed out), and I was wondering how generally safe it would be to 
> use this type of expression?

Very. That's totally standard, and anything which doesn't support it is totally bogus.

> 4: I'm pretty sure I'm right on this one, but the following expression:
> 
>    xx SEARCH OR (<exp1> <exp2> <exp3>) exp4
> 
> will only result in a match if either <exp4> is a match, or ALL of <exp1>, <exp2> 
> and <exp3> are a match. Could someone wiser than me confirm this? I'm 
> assuming there is no way to perform a search with a long list of OR conditions 
> without doing a lot of calisthenics on the search string (multiple OR conditions 
> strung together).

It's hardly calisthenics, it's just prefix notation.

You can just as well do

OR A OR B OR C D

depending whether you want the tree to bias right or bias left.  Even this is valid

OR OR A OR B C D

As is obvious when you write it out as a tree.

OR
- OR
= - A
= - OR
= = - B
= = - C
- D

> I apologize if any of these are dealt with in RFCs outside RFC3501 - I struggle to 
> keep track of all the various sub-documents relating to the protocol these days.
> 
> Thanks in advance for any advice.

Cheers,

Bron.

-- 
  Bron Gondwana
  brong@fastmail.fm
Reply
E-mail headers
From: imap@maclean.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:54 -0000
Message-ID: mailman.21.1528486494.22076.imap-protocol@mailman13.u.washington.edu permalink / raw / eml / mbox
David,  I went through what you are going through a couple of years 
ago.  My original SEARCH implementation was also a shambles and it 
was only the coming of a major new customer that prompted me to 
rework it.  I never had a single complaint about the original code 
though which I put down to the lack of support for SEARCH in 
clients.  Today I think it is much more important to have good SEARCH 
especially with IMAP servers more and more fronting email archives in 
addition to conventional email servers.

I am well aware that the specification is fuzzy and I know that some 
implementations take great liberties.  Some servers, for example, 
treat text searches as word-based while the RFC demands that they be 
string-based.  An excuse for making them word-based is that the data 
just happens to be word-indexed.  While I cannot support this, I 
suppose it is not too terrible because users these days are so 
accustomed to word-based searches (because that is what Web search 
engines do) that they might be surprised at the results of a 
string-based search.

Let me now tell you how I implement things:

1. BODY.  I thread through all the MIME parts in the message and 
select only those that have a Content-Type of "text" or "message".  I 
convert each such part to Unicode and then apply the search 
criteria.  I make no attempt to search parts that would typically be 
considered attachments.  If, in an HTML part, a phrase being searched 
for is broken up by tags, it will not be found.  Likewise if it 
contains entities.  I could do better in this regard and your 
bringing up the subject may prompt me to review a number of my own choices.

2. Headers.  I unfold headers and normalize everything to Unicode 
before searching.

3. xx SEARCH OR OR <exp1> <exp2> <exp3>.  I have no idea how safe it 
is to use such an expression but my server handles it beautifully.

4. xx SEARCH OR (<exp1> <exp2> <exp3>) exp4.  I share your 
understanding of this expression.

I also added support for ESEARCH when I did my revamp but have little 
idea of how much it gets used.

Pete

At 07:38 PM 4/7/2015, David Harris wrote:
>I'm in the process of completely rewriting the SEARCH logic in my 
>IMAP server -
>the old code was done in a hurry and was, quite frankly, 
>ridiculously bad, but that's
>another story.
>
>As I get into testing cases, I've come across a number of areas where RFC3501
>and  the various sub-documents that I know about are... uh, "vague". 
>I'd like to get a
>take on how other implementors view them.
>
>1: BODY:  When a SEARCH BODY expression is issued, how should "BODY" be
>interpreted? Is there an assumption that the server should choose the best
>candidate for a displayable message body, parse and normalize it, then search
>that? Or should it simply be taken as a raw scan of the message? How much
>unarmouring and character set normalization is assumed?
>
>2: Headers: when any of the header search expressions is issued, is the
>assumption that the raw header should be searched, or should RFC2047
>encoded-words be reduced and normalized before attempting the comparison?
>
>3: The following search is valid, according to the syntax in RFC3501:
>
>    xx SEARCH OR OR <exp1> <exp2> <exp3>
>
>and allows an OR expression to cover three terms instead of just 
>two. As such, it
>seems quite useful, but it would certainly have mystified my old 
>search code (it was
>rubbish, as I've pointed out), and I was wondering how generally 
>safe it would be to
>use this type of expression?
>
>4: I'm pretty sure I'm right on this one, but the following expression:
>
>    xx SEARCH OR (<exp1> <exp2> <exp3>) exp4
>
>will only result in a match if either <exp4> is a match, or ALL of 
><exp1>, <exp2>
>and <exp3> are a match. Could someone wiser than me confirm this? I'm
>assuming there is no way to perform a search with a long list of OR 
>conditions
>without doing a lot of calisthenics on the search string (multiple 
>OR conditions
>strung together).
>
>I apologize if any of these are dealt with in RFCs outside RFC3501 - 
>I struggle to
>keep track of all the various sub-documents relating to the protocol 
>these days.
>
>Thanks in advance for any advice.
>
>Cheers!
>
>-- David --
>
>------------------ David Harris -+- Pegasus Mail ----------------------
>Box 5451, Dunedin, New Zealand | e-mail: David.Harris@pmail.gen.nz
>            Phone: +64 3 453-6880 | Fax: +64 3 453-6612
>
>Schoolboy howler for the day:
>    "A census taker is the man who goes from home to home
>     increasing the population."
>
>
>_______________________________________________
>Imap-protocol mailing list
>Imap-protocol@u.washington.edu
>http://mailman13.u.washington.edu/mailman/listinfo/imap-protocol
Reply
E-mail headers
From: arnt@gulbrandsen.priv.no
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:54 -0000
Message-ID: c94a9300-489f-4fca-87e6-992ccc1221e8@gulbrandsen.priv.no permalink / raw / eml / mbox
David Harris writes:
> 3: The following search is valid, according to the syntax in RFC3501:
>
>    xx SEARCH OR OR <exp1> <exp2> <exp3>
>
> and allows an OR expression to cover three terms instead of 
> just two. As such, it 
> seems quite useful, but it would certainly have mystified my 
> old search code (it was 
> rubbish, as I've pointed out), and I was wondering how 
> generally safe it would be to 
> use this type of expression?

I've seen this kind of thing many times, e.g. OR OR FROM x TO x CC x, and I 
think it's fairly widely used. IIRC the Symantec IMAP proxy uses nested ORs 
en masse.

I agree about the vagueness with regard to searching. My best advice is to 
do what seems useful to users, and make searching inclusive rather than 
exact.

Arnt
Reply
E-mail headers
From: brong@fastmail.fm
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:54 -0000
Message-ID: 1428496136.932088.250634641.1291461C@webmail.messagingengine.com permalink / raw / eml / mbox
On Wed, Apr 8, 2015, at 06:05 PM, Arnt Gulbrandsen wrote:
> David Harris writes:
> > 3: The following search is valid, according to the syntax in RFC3501:
> >
> >    xx SEARCH OR OR <exp1> <exp2> <exp3>
> >
> > and allows an OR expression to cover three terms instead of 
> > just two. As such, it 
> > seems quite useful, but it would certainly have mystified my 
> > old search code (it was 
> > rubbish, as I've pointed out), and I was wondering how 
> > generally safe it would be to 
> > use this type of expression?
> 
> I've seen this kind of thing many times, e.g. OR OR FROM x TO x CC x, and I 
> think it's fairly widely used. IIRC the Symantec IMAP proxy uses nested ORs 
> en masse.
> 
> I agree about the vagueness with regard to searching. My best advice is to 
> do what seems useful to users, and make searching inclusive rather than 
> exact.

An important thing to be aware of - if you have iPhone users. iOS since version 7
has done a BODY search on every folder if you do a search.  That's prohibitively
expensive if you're scanning emails every time.

We implemented fuzzy matching support, and we just do a client quirk (that's right,
we use ID for evil) to turn a regular BODY search into a FUZZY body search,
because the alternative is a shitty experience for iPhone users.

Bron.

-- 
  Bron Gondwana
  brong@fastmail.fm
Reply
E-mail headers
From: Neil_Hunsperger@symantec.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:54 -0000
Message-ID: 14D026C7F297AD44AC82578DD818CDD038F0C37385@TUS1XCHEVSPIN35.SYMC.SYMANTEC.COM permalink / raw / eml / mbox
> From: Imap-protocol [mailto:imap-protocol-
> bounces@mailman13.u.washington.edu] On Behalf Of Arnt Gulbrandsen
> 
> David Harris writes:
> > 3: The following search is valid, according to the syntax in RFC3501:
> >
> >    xx SEARCH OR OR <exp1> <exp2> <exp3>
> >
> > and allows an OR expression to cover three terms instead of
> > just two. As such, it
> > seems quite useful, but it would certainly have mystified my
> > old search code (it was
> > rubbish, as I've pointed out), and I was wondering how
> > generally safe it would be to
> > use this type of expression?
> 
> I've seen this kind of thing many times, e.g. OR OR FROM x TO x CC x, and I
> think it's fairly widely used. IIRC the Symantec IMAP proxy uses nested ORs
> en masse.

Arnt, that's correct.

To answer David's question we had to balance the tree very carefully to avoid false negatives on some Microsoft Exchange Server versions.

-Neil
Reply
E-mail headers
From: dinh.viet.hoa@gmail.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:54 -0000
Message-ID: 7B7A934C6DB848D6B4ADE799C2F6DC78@gmail.com permalink / raw / eml / mbox
You probably want to use a full text indexer such as Lucene / Elastic Search in this case. 
It will prevent the server from iterating on each email.

-- 
Hoa V. Dinh


On Wednesday, April 8, 2015 at 5:28 AM, Bron Gondwana wrote:

> An important thing to be aware of - if you have iPhone users. iOS since version 7
> has done a BODY search on every folder if you do a search. That's prohibitively
> expensive if you're scanning emails every time.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20150408/d51411ab/attachment.html>
Reply