mailing list archives

meli community discussions

⚠️ if something does not work as intended when interracting with the mailing lists,

reach out

Github mirror Gitea repo @epilys:matrix.org

E-mail headers

From:	Timo Sirainen <tss@iki.fi>
To:	imap-protocol@u.washington.edu
Date:	Fri, 08 Jun 2018 12:34:37 -0000
Message-ID:	1145029156.10727.81.camel@hurina permalink / raw / eml / mbox

E-mail headers

From:	mrc@CAC.Washington.EDU
To:	imap-protocol@localhost
Date:	Fri, 08 Jun 2018 12:34:37 -0000
Message-ID:	Pine.OSX.4.64.0604141039280.15258@pangtzu.panda.com permalink / raw / eml / mbox

On Fri, 14 Apr 2006, Timo Sirainen wrote:
> If I read this correctly, searching logic works like this:
> 
> If search charset is US-ASCII, server either
> a) simply does a substring match for the entire message, or
> b) decodes MIME parts based on Content-Transfer-Encoding header and
> decodes the MIME headers themselves, and then does substring matching
> 
> If search charset is not US-ASCII, only b) is allowed.

My personal opinion (not "Mr. IMAP Protocol"):

(a) is the IMAP2 interpretation, and (b) is the IMAP4 interpretation; and 
that a full IMAP4 server always does (b).

However, since US-ASCII is the only mandatory to implement search charset, 
an IMAP4 server which only implements US-ASCII and only does (a) remains 
compliant.  This is a compatibility-with-the-past wart, and new server 
implementations should not do this (except as a development step)

At a minimum, UTF-8 SHOULD be supported as a search charset.

(a) has known problems.  It has false positives and false negatives 
because it does not decode the content.  This occurs even with ASCII.

> If the search key is invalid for the given character set, should server
> return BAD error to client? Are non-ASCII characters in search key
> invalid for US-ASCII charset?

I'm not certain what you mean by "invalid".

Do you mean "contain a codepoint that is not in that charset"?  If so, I 
think a failed match is better than a BAD, since it may be that the server 
has an obsolete version of that charset's definition.

> What about if search key contains non-ASCII characters but no charset
> parameter is given? Currently I assume this means just doing a substring
> search from messages without doing any charset conversions (i;octet
> comparator).

It can mean whatever you want, although perhaps a failed match is best. 
Or maybe a BAD in this case, because the specification does denounce use 
of 8-bit strings without a charset identification in section 4.3.1

It's not defined, and in such cases servers can do as they want.  As with 
other undefined situations, it may be defined in the future for servers 
that advertise an extension.

Clients which do undefined things are broken.

> I don't see it clearly mentioned how searching MIME parts should work,
> but since it only talks about substring matching, I assume that it
> shouldn't really care about MIME parts that much.

I don't understand this comment.  You should apply content transfer 
decoding.  Most people canonicalize search keys and MIME parts into UTF-8 
prior to matching.  Also look into case coercion and decomposition, 
although the IMAP i18n/stringprep specification will specify this (and 
more) in detail so maybe you want to hold off.

> Especially BODY searching talks about searching from message bodies. Are
> MIME part headers part of a message body? I guess not, because UW-IMAP
> skips them.

TEXT searches search the entire message including RFC 2822 and MIME 
metadata.

BODY searches omit RFC 2822 and MIME metadata.

> More interesting are MIME footer and trailer sections. Should they be
> searched? UW-IMAP skips them.

I consider these not to be part of a message at all for any MIME-savvy 
application.

> What about MIME boundary lines? UW-IMAP
> searches these, but not if you include its "--" prefix in search key.

Are you certain that you aren't confusing BODY and TEXT searches?  A TEXT 
search would find them, because they appear in the MIME header.

> Is "Header: value" searching required to work? I think it is, and works
> with UW-IMAP.

What do you mean by this?  If you're talking about a TEXT search, then it 
may or may not work depending upon the octets in a message.  You should be 
using a "HEADER Header: value" search instead.

> Is "line\r\nline2" (as literal of course with real CR+LF)
> searching required to work in message body? Again, I think so and works
> with UW-IMAP.

Yes, it should in a TEXT search.  But see below.

> But then is "Header: value\r\nHeader2: value2" searching
> required to work? I don't see why not, but this doesn't work anymore
> with UW-IMAP.

Once again, I'd like to understand what you mean by this.

If you're talking about a TEXT search, I don't see why it shouldn't work, 
although it might be that you have a mailbox format that uses UNIX-style 
newlines and the data was not CRLF-converted.

 	HEADER Header: {....}
 	value\r\nHeader2: value2
will certainly not work.

I don't think that it is useful for a client to have newlines in a search 
key.  Some servers try to do fuzzy matching, so for example if you search 
for "Joe's trip to Paris" there will be a match even if it was broken by a 
newline.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.

E-mail headers

From:	tss@iki.fi
To:	imap-protocol@localhost
Date:	Fri, 08 Jun 2018 12:34:37 -0000
Message-ID:	1147250390.17524.39.camel@localhost.localdomain permalink / raw / eml / mbox

Sorry for a late reply, I just seem to get more and more lagged nowadays
with replying to emails..

On Fri, 2006-04-14 at 11:30 -0700, Mark Crispin wrote:
> > If the search key is invalid for the given character set, should server
> > return BAD error to client? Are non-ASCII characters in search key
> > invalid for US-ASCII charset?
> 
> I'm not certain what you mean by "invalid".
> 
> Do you mean "contain a codepoint that is not in that charset"?  If so, I 
> think a failed match is better than a BAD, since it may be that the server 
> has an obsolete version of that charset's definition.

I think most character sets don't change anymore (ASCII and ISO-8859-*
especially), but I guess it's nicer for clients to not get BAD replies.

> > What about if search key contains non-ASCII characters but no charset
> > parameter is given? Currently I assume this means just doing a substring
> > search from messages without doing any charset conversions (i;octet
> > comparator).
> 
> It can mean whatever you want, although perhaps a failed match is best. 
> Or maybe a BAD in this case, because the specification does denounce use 
> of 8-bit strings without a charset identification in section 4.3.1

I understood that section to only mean message bodies sent as reply to
FETCH.

> > More interesting are MIME footer and trailer sections. Should they be
> > searched? UW-IMAP skips them.
> 
> I consider these not to be part of a message at all for any MIME-savvy 
> application.

OK, this is mostly what I was concerned about. The RFC doesn't say
anything about if they should or shouldn't be searched.

> > What about MIME boundary lines? UW-IMAP
> > searches these, but not if you include its "--" prefix in search key.
> 
> Are you certain that you aren't confusing BODY and TEXT searches?  A TEXT 
> search would find them, because they appear in the MIME header.

Right, sorry, that must be it.

> > Is "Header: value" searching required to work? I think it is, and works
> > with UW-IMAP.
> 
> What do you mean by this?  If you're talking about a TEXT search, then it 
> may or may not work depending upon the octets in a message.  You should be 
> using a "HEADER Header: value" search instead.

Yes, I mean TEXT search. I know HEADER is the correct way, but since I
was going to fix my SEARCH code, I thought I'd make it work correctly in
all cases (if there were correct ways for cases like this).

> > Is "line\r\nline2" (as literal of course with real CR+LF)
> > searching required to work in message body? Again, I think so and works
> > with UW-IMAP.
> 
> Yes, it should in a TEXT search.  But see below.
> 
> > But then is "Header: value\r\nHeader2: value2" searching
> > required to work? I don't see why not, but this doesn't work anymore
> > with UW-IMAP.
> 
> Once again, I'd like to understand what you mean by this.
> 
> If you're talking about a TEXT search, I don't see why it shouldn't work, 
> although it might be that you have a mailbox format that uses UNIX-style 
> newlines and the data was not CRLF-converted.

Yes, TEXT search. If it's supposed (required) to work, then I think it
shouldn't matter if the data is in LF or CRLF format in the mailbox,
because client always sees the mails CRLF-terminated.

> I don't think that it is useful for a client to have newlines in a search 
> key.  Some servers try to do fuzzy matching, so for example if you search 
> for "Joe's trip to Paris" there will be a match even if it was broken by a 
> newline.

And do you think this is still allowed by RFC?

I was thinking about allowing some text search engines to be used with
my server, but I thought about creating some new extension for it, since
I thought using them with SEARCH would break the RFC (because eg. they
couldn't find "imo" from "timo" string and in general the matches
wouldn't be exact).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: This is a digitally signed message part
URL: <http://mailman13.u.washington.edu/pipermail/imap-protocol/attachments/20060510/e22be140/attachment.sig>

mailing list archives

[Imap-protocol] SEARCH