match - How to build an Elasticsearch phrase query that matches text with special characters? -
during last few days i've been playing around elastic-search indexing , searching , i've build different queries intended to. problem right being able build query able match text special characters if don't type them in "search bar". i'll give example explain mean.
imagine have document indexed contains field called page content
. inside field, can have part of text such
"o carro joão é preto." (means joão's car black in portuguese)
what want able type like:
o carro joao e preto
and still able proper match.
what i've tried far:
i've been using match phrase query provided in documentation of elasticsearch (here) such example below:
get _search { "query": { "match_phrase": { "page content": { "query": "o carro joao e preto" } } } }
the result of query gives me 0 hits. acceptable given provided content of query different has been stored in document.
i've tried setting ascii folding token filter (here) i'm not sure of how use it. i've done creating new index query:
put /newindex ' { "page content": "o carro joão é preto", "settings" : { "analysis" : { "analyzer" : { "default" : { "tokenizer" : "standard", "filter" : ["standard", "my_ascii_folding"] } }, "filter" : { "my_ascii_folding" : { "type" : "asciifolding", "preserve_original" : true } } } } }'
then if try query, using match_phrase query provided above, this:
o carro joao e preto
it should show me correct result wanted to. thing isn't working me. forgetting something? i've been around last 2 days without success , feel it's i'm missing.
so question: what have desired matching?
managed find answer own question. had change analyzer little bit when created index. further details in previous answer:
my code now:
{ "settings" : { "analysis" : { "analyzer" : { "default" : { "tokenizer" : "standard", "filter" : ["standard", "lowercase", "asciifolding"] }, "text" : { "tokenizer" : "standard", "filter" : ["standard", "lowercase"], "char_filter" : "html_strip" }, "sortable" : { "tokenizer" : "keyword", "filter" : ["lowercase"], "char_filter" : "html_strip" } } } } }
Comments
Post a Comment