On Tue, 11 Apr 2006 18:37:39 -0700 (Pacific Daylight Time)
Mark Crispin <MRC@CAC.Washington.EDU> wrote:
> On Tue, 11 Apr 2006, Vladimir A. Butenko wrote:
>> It would be beneficial if a client can learn AT LEAST if the server
>>namespace
>> is case-sensitive or not. Because any client does have to switch its
>>internal
>> routines to the case-[in]sensitive mode.
>
> Well, the attitude of the past was that the client shouldn't do any such
>thing; that is, that it could not (and should not) make any assumptions
>about how the server works.
A user asked the client to show all mailboxes with names starting with
"Mark" and all those ending with "CRISPIN":
There is a mailbox called "Mark Crispin". The system is case-insensitive.
Now (removing all unrelated parts of the protocol):
a LIST "Mark%"
* LIST "Mark Crispin"
a OK
b LIST "" "%CRISPIN"
* LIST "MARK CRISPIN"
The later is a questionable practice, but it definitely has more merits than
returning unmodified "Mark Crispin" as our server (and, I guess, yours) is
doing now. The "Mark Crispin" string would confuse a client that does not
expect that string to match the "%CRISPIN" pattern.
If there is no "fixed" name for INBOX, it's just a "case-insensitive" name,
why would "Mark Crispin" be a "real" name for that mailbox of a
case-insensitive system? So, if the server is free to return "Inbox" on LIST
"Inbox", it's free to return "MARK CRISPIN" on "%CRISPIN".
And then the client will, obviously, display these mailboxes as different
ones, and all the troubles will trouble us there..
> I wouldn't be opposed to an extension of this nature, but it may be
>difficult to implement. A UNIX based server could not assume that it is
>case-sensitive; it would have to determine this on a filesystem basis and
>it may not have a good way of knowing (other than empirical testing) if a
>remote filesystem is case-sensitive or not.
Delete the .mark file, create a .Mark file, try to read ".mark" - if you do
that once, during start-up of your server, that would be enough. It's more
difficult when you have to serve several millions of users, where the
storage can be distributed to many NFS and CFS file systems - but believe me
- I have never seen any real-life installation that is at least 100,000
users strong (leave alone 5,000,000 users strong) that used different file
systems for different users.
>> But, as usual, the Pandora box is happy to be opened again:
>
> You may be interested to know that your Cyrillic example actually
>displayed correctly in my Japanese environment! :-)
All Cyrillic characters are included into the ISO-2022-JP charset.
I'm pretty sure this is a preparation for occupation (uups, liberation), but
I'm not that sure about the expected direction of that occupation... :-)
But - there is no low-case/upper-case problem with the Japanese charsets, so
you may want to play with mailboxes in Roman, but non-Latin alphabets, -
French, Spanish, German etc. Umlauts are tough boys to fight with in the
upper-lowercase battles...
>> with non-Latin names: UTF-7 comes in and destroyes everything ;-(
>
>For what it's worth, I just implemented support for Unicode
>case-insensitivity in the development sources of UW imapd. Actually, I
>cheated and canonicalized everything to titlecase.
>
> Decomposition is next.
>
>> Decoding UTF-7 back into UTF-8 and using it internally is ugly, as you can
>> get a completely different name in LIST results.
>
> Don't the rules for modified UTF-7 define a completely reversible
>transform?
I'm afraid there is a misunderstanding here. How should I specify the ??%?
pattern in the LIST command? "&hjhj-%&jkjk-" where "&hjhj-" is "??", and
"&jk-" is "?"?
I should, because - "&hjhj%jkjk-" would be illegal, right?
But I have no mailboxes with names starting with "&hjhj-", all of them start
with "&hjhjkkk". So, the server already need to be "smart" and not use UTF-7
in patterns, but convert the pattern to the UTF-8/Unicode form and to
compare it not with mailbox names, but with mailbox names converted from
UTF-7 into Unicode.
And this will result in LIST responses that have almost NOTHING in common
with the supplied pattern, if both are treated as ASCII strings.
So, it's the same question as above: should the client expect that if it
wants "Inbo%", it will get "Inbogus town", but not INBOX? Or should the
client NOT assume anything and treat ANY response from the server as
correctly matching the pattern the client has provided - matching according
to the rules known to the server only, not the client?
If someone suggests to use the later answer, then it's a call for trouble:
then if a client makes a call
A LIST %
and gets
* LIST ZZZ
and then it makes a call
A LIST ZZZ/%
(as many clients do to deal with mailbox hierarchies)
then the client HAS ABSOLUTELY NO RIGHT to expect that all returned names
will start with "ZZZ/", and thus - can be displayed as "ZZZ" "subtree".
Sounds strange, right? But we are saying that when a client sends
"&hj-%&HJ-" in the LIST command, it should not expect to see any name that
ends on "HJ-", right?
So, we conclude that Client "sometimes" can assume that the LIST results
will be matching the LIST pattern when both strings are interpreted as ASCII
strings, and sometimes the client cannot assume that? That's a mess, isn't
it?
>> That's why "strict case-sensitivity" is a GOOD thing from the protocol
>>point
>> of view, and all IMAP servers MUST be case-sensitive to avoid confusion in
>> clients (but not in users). Unfortunately, many IMAP servers do map
>>mailbox
>> names into OS file names, and we have all these "semi-case-insensitivity"
>> problems today.
>
> I don't follow you as to why all IMAP server "MUST be case-sensitive",
>since clearly there are examples of servers which are case-insensitive,
Because case-insensitive servers create a mess. The protocol becomes
ill-defined (read: broken, incomplete, unusable - select your own favourite,
the most offending word :-).
If the servers would impose strict case-sensitivity (including that for
INBOX), the IMAP protocol (at least within its mailbox-name related part)
becomes a well-defined, "real", "professional" (select the most pleasing
word here) protocol. But it would be harder to implement on the systems with
case-insensitive file systems (OS/2 and OS/3 aka Windows, MacOSX, VMS) - if
the server chooses to implement mailboxes as file system objects.
> and
>I don't see why a server could not decide to map Unicode case for M-UTF7
>names.
Because then a client may assume virtually nothing about the server
responses return for its wild-carded requests. And without being able to
assume these things, the client cannot do anything with mailboxes other than
showing them as a linear list.
>> The solution would be to get rid of UTF-7 and switch to the plain
>>UNENCODED
>> UTF-8 for mailbox names.
>
> I agree. The whole point of modified UTF-7 in 1996 was to put a halt to
>the (then-common) practice of "just send 8-bits" for mailbox names in local
>character sets.
>
> The people who did (do?) so have had 9 1/2 years warning. That should be
>enough. We should progress to UTF-8 mailbox names.
But before you do, please investigate all these case-sensitivity problems
with non-LATIN (and non-Japanese ;-) alphabets. And, while doing this,
please remember that the number of people who were shocked to learn than
they could not create a folder named "12/12/2005" is much higher than the
number of people knowing what a "hierarchy separator" is. But that number is
still smaller than the number of people who did succeed to create such a
mailbox and then asked their ISP/IT support about "that funny mailbox '12'
and some strange symbols around it". I.e. escape symbols would be good, or,
of you choose to use UTF-8/Unicode, you may want to use some "unprintable"
character as the path separator.
>>> Science does not emerge from voting, party politics, or public debate.
>> Political Science does :-). There is a lot one can learn by watching
>>primates
>> voting. Or debating. Especially publicly :-)
>
> As I say on my web page, any field of study which has "science" in its
>name is not a science... ;-)
Sure. I totally agree, and that's why I was hated by all Computer Science
departments :-) Their only competitors where teachers of the Scientific
Communism - but they were weaker, as they did not believe in their books
themselves. CS departments were tougher... :-)
>>> Si vis pacem, para bellum.
>> and when you wish for war, prepare for a long boring peace? :-)
>
> I don't know, as only fascists wish for war;
Mark, I just wanted to make a quite innocent joke. Looks like I've failed,
and let's not turn it into a Political Science exercise. Let's continue to
play within the Computer Science barrack :-)
> -- Mark --
>
> http://staff.washington.edu/mrc
> Science does not emerge from voting, party politics, or public debate.
> Si vis pacem, para bellum.
Sincerely,
Vladimir