mailing list archives

meli community discussions

⚠️ if something does not work as intended when interracting with the mailing lists,
reach out Github mirror Gitea repo @epilys:matrix.org

E-mail headers
From: Timo Sirainen <tss@iki.fi>
To: imap-protocol@u.washington.edu
Date: Fri, 08 Jun 2018 12:34:48 -0000
Message-ID: 25F5C770-A983-40E7-A075-87A27449240F@iki.fi permalink / raw / eml / mbox
http://www.iana.org/assignments/collation/collation-index.html

Looks like there aren't many comparators. I guess there aren't any other such lists either by other organizations? I'm mainly wondering how to add support for languages where the acutes/graves in letters should be ignored by searches. I guess I could simply add some kludgy setting for that, but then I shouldn't be advertising I18NLEVEL=1 anymore, because i;unicode-casemap is no longer being used as the default comparator..

Also full text search indexes make the matter of changing comparators or using multiple comparators more-or-less impossible. Of course I18NLEVEL=2 requires that i;unicode-casemap must be one of the available comparators, so I couldn't advertise that either.
Reply
E-mail headers
From: alexey.melnikov@isode.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:48 -0000
Message-ID: 504E328F.7030000@isode.com permalink / raw / eml / mbox
Hi Timo,

On 10/09/2012 17:33, Timo Sirainen wrote:
> http://www.iana.org/assignments/collation/collation-index.html
>
> Looks like there aren't many comparators. I guess there aren't any other such lists either by other organizations?
I am not aware of any.

The only other comparator which is not registered that was discussed 
recently is a numeric one which can also handle negative integers.
> I'm mainly wondering how to add support for languages where the acutes/graves in letters should be ignored by searches.
If you write it down, I will review and can find other reviewers. I 
don't think this is the first time this was proposed, so maybe there is 
some utility in it.
> I guess I could simply add some kludgy setting for that, but then I shouldn't be advertising I18NLEVEL=1 anymore, because i;unicode-casemap is no longer being used as the default comparator..
>
> Also full text search indexes make the matter of changing comparators or using multiple comparators more-or-less impossible. Of course I18NLEVEL=2 requires that i;unicode-casemap must be one of the available comparators, so I couldn't advertise that either.
Reply
E-mail headers
From: tss@iki.fi
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:48 -0000
Message-ID: D20A4AB1-42CA-4AB4-B481-FEFBA5D4AC9D@iki.fi permalink / raw / eml / mbox
On 10.9.2012, at 21.33, Alexey Melnikov wrote:

>> I'm mainly wondering how to add support for languages where the acutes/graves in letters should be ignored by searches.
> If you write it down, I will review and can find other reviewers. I don't think this is the first time this was proposed, so maybe there is some utility in it.

Looks like i;unicode-casemap could already optionally do that, according to its security section:

>    2) Step (2)(b) defines a subset of Normalization Form KD (NFKD) that
>       does not require normalization of out-of-order diacriticals.
>       However, an implementation MAY use an NFKD library routine that
>       does such normalization.  This impacts step (2)(b) and possibly
>       also step (1)(a), and is an issue only with ill-formed UTF-8
>       input.

I'm thinking about just using Lucene's ICUFoldingFilterFactory for this.
Reply
E-mail headers
From: alexey.melnikov@isode.com
To: imap-protocol@localhost
Date: Fri, 08 Jun 2018 12:34:48 -0000
Message-ID: 5055879D.9020106@isode.com permalink / raw / eml / mbox
On 14/09/2012 20:52, Timo Sirainen wrote:
> On 10.9.2012, at 21.33, Alexey Melnikov wrote:
>
>>> I'm mainly wondering how to add support for languages where the acutes/graves in letters should be ignored by searches.
>> If you write it down, I will review and can find other reviewers. I don't think this is the first time this was proposed, so maybe there is some utility in it.
> Looks like i;unicode-casemap could already optionally do that, according to its security section:

It depends on whether you want sorting or comparison for equality 
(during searching). I think i;unicode-casemap provides the former, but 
wouldn't give you the latter.

>>     2) Step (2)(b) defines a subset of Normalization Form KD (NFKD) that
>>        does not require normalization of out-of-order diacriticals.
>>        However, an implementation MAY use an NFKD library routine that
>>        does such normalization.  This impacts step (2)(b) and possibly
>>        also step (1)(a), and is an issue only with ill-formed UTF-8
>>        input.
> I'm thinking about just using Lucene's ICUFoldingFilterFactory for this.
Reply