일단 Local에서 ES와 Kibana를 Run 한 뒤 이것저것 테스트한 내용을 서술한다.
(별 것 아닌 내용이지만 남겨놓는게 나을 것 같아서)
POST test_index/_analyze
{
"analyzer": "standard",
"text": "월, 화, 수, 목, 금, 토, 일"
}
// Result
{
"tokens" : [
{
"token" : "월",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<HANGUL>",
"position" : 0
},
{
"token" : "화",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<HANGUL>",
"position" : 1
},
{
"token" : "수",
"start_offset" : 6,
"end_offset" : 7,
"type" : "<HANGUL>",
"position" : 2
},
{
"token" : "목",
"start_offset" : 9,
"end_offset" : 10,
"type" : "<HANGUL>",
"position" : 3
},
{
"token" : "금",
"start_offset" : 12,
"end_offset" : 13,
"type" : "<HANGUL>",
"position" : 4
},
{
"token" : "토",
"start_offset" : 15,
"end_offset" : 16,
"type" : "<HANGUL>",
"position" : 5
},
{
"token" : "일",
"start_offset" : 18,
"end_offset" : 19,
"type" : "<HANGUL>",
"position" : 6
}
]
}
POST _analyze
{
// analyzer를 지정하지 않을 경우 "standard" analyzer 적용
"text": "자, 과연 어떻게 나올까? 에라 모르겠다... I don't know!"
}
// Result
{
"tokens" : [
{
"token" : "자",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<HANGUL>",
"position" : 0
},
{
"token" : "과연",
"start_offset" : 3,
"end_offset" : 5,
"type" : "<HANGUL>",
"position" : 1
},
{
"token" : "어떻게",
"start_offset" : 6,
"end_offset" : 9,
"type" : "<HANGUL>",
"position" : 2
},
{
"token" : "나올까",
"start_offset" : 10,
"end_offset" : 13,
"type" : "<HANGUL>",
"position" : 3
},
{
"token" : "에라",
"start_offset" : 15,
"end_offset" : 17,
"type" : "<HANGUL>",
"position" : 4
},
{
"token" : "모르겠다",
"start_offset" : 18,
"end_offset" : 22,
"type" : "<HANGUL>",
"position" : 5
},
{
"token" : "i",
"start_offset" : 26,
"end_offset" : 27,
"type" : "<ALPHANUM>",
"position" : 6
},
{
"token" : "don't",
"start_offset" : 28,
"end_offset" : 33,
"type" : "<ALPHANUM>",
"position" : 7
},
{
"token" : "know",
"start_offset" : 34,
"end_offset" : 38,
"type" : "<ALPHANUM>",
"position" : 8
}
]
}
elasticsearch 7.7 버전에서는 create index 시 analysis를 setting 안으로 넣어주어야 하는 것 같다.
(html_strip Character Filter 예제)
PUT /test_index_for_html_strip
{
"settings": {
"index" : {
"number_of_shards": 5,
"number_of_replicas": 1
},
"analysis": {
"analyzer": {
"custom_html_strip_analyzer": {
"type": "custom",
"char_filter": [
"html_strip"
],
"tokenizer": "standard",
"filter": [
"lowercase"
]
}
}
}
}
}
POST /test_index_for_html_strip/_analyze
{
"analyzer": "custom_html_strip_analyzer",
"text": "<B>Elasticsearch</B> is cool! <br>"
}
// Result
{
"tokens" : [
{
"token" : "elasticsearch",
"start_offset" : 3,
"end_offset" : 20,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "is",
"start_offset" : 21,
"end_offset" : 23,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "cool",
"start_offset" : 24,
"end_offset" : 28,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}
'개발 > Elasticsearch' 카테고리의 다른 글
reindex (index, template migrations) (0) | 2020.07.03 |
---|---|
Synonym Token Filter (동의어 처리) (0) | 2020.05.20 |
static index vs dynamic index (0) | 2020.05.20 |
Mapping Parameters (0) | 2020.05.18 |
Elasticsearch APIs (0) | 2020.05.18 |