일단 Local에서 ES와 Kibana를 Run 한 뒤 이것저것 테스트한 내용을 서술한다.

(별 것 아닌 내용이지만 남겨놓는게 나을 것 같아서)

 

POST test_index/_analyze
{
  "analyzer": "standard",
  "text": "월, 화, 수, 목, 금, 토, 일"
}

// Result
{
  "tokens" : [
    {
      "token" : "월",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<HANGUL>",
      "position" : 0
    },
    {
      "token" : "화",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<HANGUL>",
      "position" : 1
    },
    {
      "token" : "수",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<HANGUL>",
      "position" : 2
    },
    {
      "token" : "목",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "<HANGUL>",
      "position" : 3
    },
    {
      "token" : "금",
      "start_offset" : 12,
      "end_offset" : 13,
      "type" : "<HANGUL>",
      "position" : 4
    },
    {
      "token" : "토",
      "start_offset" : 15,
      "end_offset" : 16,
      "type" : "<HANGUL>",
      "position" : 5
    },
    {
      "token" : "일",
      "start_offset" : 18,
      "end_offset" : 19,
      "type" : "<HANGUL>",
      "position" : 6
    }
  ]
}

 

POST _analyze
{
// analyzer를 지정하지 않을 경우 "standard" analyzer 적용
  "text": "자, 과연 어떻게 나올까? 에라 모르겠다... I don't know!"
}

// Result
{
  "tokens" : [
    {
      "token" : "자",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<HANGUL>",
      "position" : 0
    },
    {
      "token" : "과연",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "<HANGUL>",
      "position" : 1
    },
    {
      "token" : "어떻게",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "<HANGUL>",
      "position" : 2
    },
    {
      "token" : "나올까",
      "start_offset" : 10,
      "end_offset" : 13,
      "type" : "<HANGUL>",
      "position" : 3
    },
    {
      "token" : "에라",
      "start_offset" : 15,
      "end_offset" : 17,
      "type" : "<HANGUL>",
      "position" : 4
    },
    {
      "token" : "모르겠다",
      "start_offset" : 18,
      "end_offset" : 22,
      "type" : "<HANGUL>",
      "position" : 5
    },
    {
      "token" : "i",
      "start_offset" : 26,
      "end_offset" : 27,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "don't",
      "start_offset" : 28,
      "end_offset" : 33,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "know",
      "start_offset" : 34,
      "end_offset" : 38,
      "type" : "<ALPHANUM>",
      "position" : 8
    }
  ]
}

 

 

 

elasticsearch 7.7 버전에서는 create index 시 analysis를 setting 안으로 넣어주어야 하는 것 같다.

(html_strip Character Filter 예제)

PUT /test_index_for_html_strip
{
  "settings": {
    "index" : {
      "number_of_shards": 5,
      "number_of_replicas": 1
    },
    "analysis": {
    "analyzer": {
      "custom_html_strip_analyzer": {
        "type": "custom",
        "char_filter": [
          "html_strip"
        ],
        "tokenizer": "standard",
        "filter": [
          "lowercase"
        ]
      }
    }
  }
  }
}
POST /test_index_for_html_strip/_analyze
{
  "analyzer": "custom_html_strip_analyzer", 
  "text": "<B>Elasticsearch</B> is cool! <br>"
}

// Result
{
  "tokens" : [
    {
      "token" : "elasticsearch",
      "start_offset" : 3,
      "end_offset" : 20,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "is",
      "start_offset" : 21,
      "end_offset" : 23,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "cool",
      "start_offset" : 24,
      "end_offset" : 28,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}

'개발 > Elasticsearch' 카테고리의 다른 글

reindex (index, template migrations)  (0) 2020.07.03
Synonym Token Filter (동의어 처리)  (0) 2020.05.20
static index vs dynamic index  (0) 2020.05.20
Mapping Parameters  (0) 2020.05.18
Elasticsearch APIs  (0) 2020.05.18

+ Recent posts