On 14/09/2012 20:52, Timo Sirainen wrote:
> On 10.9.2012, at 21.33, Alexey Melnikov wrote:
>
>>> I'm mainly wondering how to add support for languages where the acutes/graves in letters should be ignored by searches.
>> If you write it down, I will review and can find other reviewers. I don't think this is the first time this was proposed, so maybe there is some utility in it.
> Looks like i;unicode-casemap could already optionally do that, according to its security section:
It depends on whether you want sorting or comparison for equality
(during searching). I think i;unicode-casemap provides the former, but
wouldn't give you the latter.
>> 2) Step (2)(b) defines a subset of Normalization Form KD (NFKD) that
>> does not require normalization of out-of-order diacriticals.
>> However, an implementation MAY use an NFKD library routine that
>> does such normalization. This impacts step (2)(b) and possibly
>> also step (1)(a), and is an issue only with ill-formed UTF-8
>> input.
> I'm thinking about just using Lucene's ICUFoldingFilterFactory for this.