[Elasticsearch] 엘라스틱서치 벼락치기(2)

티스토리 뷰

[Elasticsearch] 엘라스틱서치 벼락치기(2) - 데이터처리

강씨아저씨 2020. 4. 25. 22:04

이번 포스팅은 사내에서 Elasticsearch 관련 내용 발표를 위해 "시작하세요! 엘라스틱서치"서적을 기반으로 학습하고 이해한 내용을 정리하는 포스팅이다. Elasticsearch 역시 내용이 많기 때문에 시리즈로 나눠서 정리할 예정이다. 모든 내용은 Elasticsearch 7.6 버전 기준이다.

오늘은 Elasticsearch 를 이용한 데이터 처리방법에 대해서 알아볼 예정이다.

Elasticsearch 환경구축

본격적으로 Elasticsearch 를 사용해보기 전에 우선은 Docker 를 이용해서 간단하게 환경을 구축해보자.

다음과 같은 명령어로 Elasticsearch Image 를 Pull 받는다.

docker pull elasticsearch:7.6.2

그 후, 이미지 기반으로 컨테이너를 만들고 실행하면 싱글노드 환경의 Elasticsearch 구성은 끝이다.

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --name elastic_7_6 elasticsearch:7.6.2

로컬환경에서 다음과 같이 Elasticsearch 를 사용해 볼 수 있다.

curl -GET 'localhost:9200/_all?pretty'

데이터 추가

이제 Elasticsearch 에 데이터(Document)를 추가해보자.

Elasticsearch 의 Document 는 HTTP 의 POST, PUT 메소드를 사용해서 추가 할 수 있다.

우리가 만들어볼 Document 는 다음과 같은 필드(Field)로 구성되어 있다.

{ 
  name: 'hong-gil-dong',
  birth: 1988,
  gender: 'male'
}

curl 명령어를 이용해서 데이터를 추가해보자.

curl -H 'Content-Type: application/json' -X POST 'localhost:9200/user/_doc/1' -d '
{ "name":"hong-gil-dong", "birth": 1988, "gender": "male" }'
--- 반환값 ---
{"_index":"user","_type":"_doc","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

반환값을 보면 _index 는 user, _type 은 _doc, _id 는 1인 Document 가 생성되었다는 것을 확인할 수 있다.

POST 의 경우 생성 시 id를 생략하고 생성할 경우 다음과 같이 임의의 id 가 할당된 것을 확인할 수 있다.

curl -H 'Content-Type: application/json' -X POST 'localhost:9200/user/_doc' -d '
{ "name":"kim-su-mi", "birth": 1949, "gender": "female" }'
--- 반환값 ---
{"_index":"user","_type":"_doc","_id":"qlCt2XEB7jJnOiNtoCQV","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1}

PUT 의 경우 다음과 같은 오류를 반환한다.

curl -H 'Content-Type: application/json' -X PUT 'localhost:9200/user/_doc' -d '
{ "name":"hong-gil-dong", "birth": 1988, "gender": "male" }'
--- 반환값 ---
{"error":"Incorrect HTTP method for uri [/user/_doc] and method [PUT], allowed: [POST]","status":405}

그리고 같은 id 의 Document 를 생성할 경우 기존 데이터를 새로운 데이터로 덮어쓴다. (PUT,POST 상관없다)

curl -H 'Content-Type: application/json' -X PUT 'localhost:9200/user/_doc/1' -d '
{ "name":"hong-gil-dong2", "birth": 1990, "gender": "male" }'
--- 반환값 ---
{"_index":"user","_type":"_doc","_id":"1","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":2,"_primary_term":1}

새로운 데이터로 덮어썼기 때문에 result 가 updated 이고, _version 도 1이 증가한 2가 된 것을 확인할 수 있다.

다음과 같이 _created 를 이용해서도 Document를 생성할 수 있다. 단 id 가 필수이고, 한번 생성 후 새로운 데이터로 덮어쓰지 못한다.

데이터 추가에 대한 자세한 내용은 여기에서 확인할 수 있다.

curl -H 'Content-Type: application/json' -X POST 'localhost:9200/user/_create/2' -d '
{ "name":"hong-gil-dong2", "birth": 2000, "gender": "male" }'
--- 혹은 ---
curl -H 'Content-Type: application/json' -X PUT 'localhost:9200/user/_create/2' -d '
{ "name":"hong-gil-dong2", "birth": 2000, "gender": "male" }'

데이터 삭제

데이터 삭제는 Document, Index 단위로 삭제할 수 있으며 HTTP 의 DELETE 메소드를 사용한다.

앞에서 생성했던 user/_doc/2 를 삭제해보자.

curl -X DELETE 'localhost:9200/user/_doc/2'
--- 반환값 ---
{"_index":"user","_type":"_doc","_id":"2","_version":2,"result":"deleted","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":8,"_primary_term":1}

--- 삭제후 조회 ---
curl -X GET 'localhost:9200/user/_doc/2'
{"_index":"user","_type":"_doc","_id":"2","found":false}

Document 삭제 후 다시 동일 id 의 데이터를 추가할 경우 다음과 같이 version 이 증가된 것을 확인할 수 있다.

curl -H 'Content-Type: application/json' -X POST 'localhost:9200/user/_create/2' -d '
{ "name":"hong-gil-dong2", "birth": 2000, "gender": "male" }'
--- 반환값 ---
{"_index":"user","_type":"_doc","_id":"2","_version":3,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":9,"_primary_term":1}

이는 Document 를 삭제할 경우 실제로 Document 가 삭제되었다기보단, _source 에 입력되어있는 데이터가 빈값으로 업데이트되고, 검색이 되지 않는 상태로 변경되었다고 이해하는게 편하다.

이번에는 Index 를 삭제해보자. 다음과 같이 Index 단위로도 삭제할 수 있다.

curl -X DELETE 'localhost:9200/user'
--- 반환값 ---
{ "acknowledged" : true }

Index 가 삭제될 경우 Index 에 있는 모든 Document 가 일괄 삭제된다.

curl -X GET 'localhost:9200/user/_doc/1'
--- 반환값 ---
{"error":{ ...
          "type":"index_not_found_exception",
          "reason":"no such index [user]",
          "resource.type":"index_expression",
          "resource.id":"user",
          "index_uuid":"_na_",
          "index":"user"},
 "status":404}

이후 다시 Document 를 생성할경우 version 이 다시 1부터 시작되는것으로 보아 Document 삭제와는 다르게 실제로 Index 의 모든 Document 를 제거했다는 것을 알 수 있다.

curl -H 'Content-Type: application/json' -X POST 'localhost:9200/user/_create/1' -d '
{ "name":"hong-gil-dong", "birth": 1988, "gender": "male" }'
--- 반환값 ---
{"_index":"user","_type":"_doc","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

데이터 삭제에 대한 더 자세한 내용은 여기에서 확인할 수 있다.

데이터 수정

이번에는 생성된 Document 를 수정해보자. 데이터 수정은 HTTP 의 POST 메소드를 이용하며 다음과 같은 URI 를 사용한다.
이는 _update API 라고 불리며 더욱 자세한 내용은 여기서 확인할 수 있다.

이제 기존에 생성했던 id 1번의 Document 에 locale 이라는 속성을 추가하고 기존의 속성을 변경해보자.
수정할 때는 body 에 doc 이라는 키를 이용해서 수정한다.

curl -H 'Content-Type: application/json' -X POST 'localhost:9200/user/_update/1' -d '
{ "doc": { "birth": 2020, "locale":"ko-KR" } }'
--- 반환값 ----
{"_index":"user","_type":"_doc","_id":"1","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1}

version 이 1에서 2로 올라가고 _source 에 locale 이라는 속성이 추가되었고 birth 가 2020 으로 변경된 것을 확인할 수 있다.

curl -X GET 'localhost:9200/user/_doc/1'
--- 반환값 ---
{"_index":"user","_type":"_doc","_id":"1","_version":2,"_seq_no":1,"_primary_term":1,"found":true,"_source":{"name":"hong-gil-dong","birth":2020,"gender":"male","locale":"ko-KR"}}

_update API 는 데이터 추가때처럼 새로운 Document 구조로 덮어쓰는게 아니라 GET 메소드로 저장된 Document 를 가져와서 입력한 명령을 토대로 새로 변경된 Document 내용을 다시 만들고 기존 Document 에 덮어쓰는 방식으로 동작한다.

이번에는 script 를 사용해서 _source 에 있는 Document 내용에 연산을 적용해서 수정해보자.

curl -H 'Content-Type: application/json' -X POST 'localhost:9200/user/_update/1' -d '
{ "script": { "source": "ctx._source.birth += params.count", "lang":"painless", "params": { "count": 10 } } }'
--- 반환값 ---
{"_index":"user","_type":"_doc","_id":"1","_version":3,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":2,"_primary_term":1}

다음과 같이 birth 필드값이 2020 에서 2030 로 변경된 것을 확인할 수 있다.

curl -X GET 'localhost:9200/user/_doc/1'
--- 반환값 ---
{"_index":"user","_type":"_doc","_id":"1","_version":3,"_seq_no":2,"_primary_term":1,"found":true,"_source":{"name":"hong-gil-dong","birth":2030,"gender":"male","locale":"ko-KR"}}

이처럼 간단한 연산 외에도 script 는 다양한 상황에서 활용할 수 있기 때문에 궁금하신 분들은 여기를 참고하면 된다

대규모 데이터 처리

Elasticsearch 에서는 대규모 데이터를 한 번에 처리할 수 있는 벌크 API를 제공한다.

데이터를 모아서 한번에 처리하므로 1건씩 요청할 때보다 처리속도가 빠르다. 자세한 정보는 여기를 참고하면 된다.

벌크 API는 Document 의 추가, 수정, 삭제 처리를 할 수 있고, 삭제를 제외한 동작은 메타정보와 요청 데이터가 한쌍으로 묶어 동작한다. 메타정보는 { 동작 { 인덱스, 도큐먼트 id } } 형식으로 입력한다. id는 생략할 수 있으며 생략하면 임의의 값이 id로 저장된다.

이번에는 벌크 API 를 이용해서 대규모 데이터를 추가해보자.

curl -H 'Content-Type: application/json' -X POST 'localhost:9200/_bulk' -d'
{ "index" : { "_index" : "user_bulk", "_id" : "1" } }
{ "first_name" : "James", "last_name": "Butt", "birth": 1970, "country": "New Orleans", "zip": "70116" }
{ "index" : { "_index" : "user_bulk", "_id" : "2" } }
{ "first_name" : "Josephine", "last_name": "Darakjy", "birth": 1967 , "country": "Livingston", "zip": "48116" }
{ "index" : { "_index" : "user_bulk", "_id" : "3" } }
{ "first_name" : "Art", "last_name": "Venere", "birth": 1977, "country": "Gloucester", "zip": "8014" }
{ "index" : { "_index" : "user_bulk", "_id" : "4" } }
{ "first_name" : "Lenna", "last_name": "Paprocki", "birth": 1991, "country": "Anchorage", "zip": "99501" }
{ "index" : { "_index" : "user_bulk", "_id" : "5" } }
{ "first_name" : "Donette", "last_name": "Foller", "birth": 1997, "country": "Butler", "zip": "45011" }
{ "index" : { "_index" : "user_bulk", "_id" : "6" } }
{ "first_name" : "Simona", "last_name": "Morasca", "birth": 1989, "country": "Ashland", "zip": "44805" }
{ "index" : { "_index" : "user_bulk", "_id" : "7" } }
{ "first_name" : "Mitsue", "last_name": "Tollner", "birth": 1988, "country": "Cook", "zip": "60632" }
{ "index" : { "_index" : "user_bulk", "_id" : "8" } }
{ "first_name" : "Lenna", "last_name": "Newville", "birth": 2002, "country": "San Francisco", "zip": "27601" }
{ "index" : { "_index" : "user_bulk", "_id" : "9" } }
{ "first_name" : "Dean", "last_name": "Ketelsen", "birth": 1988, "country": "San Diego", "zip": "11801" }
{ "index" : { "_index" : "user_bulk", "_id" : "10" } }
{ "first_name" : "Eden", "last_name": "Jayson", "birth": 1994, "country": "Los Angeles", "zip": "91106" }
'

다음과 같은 결과를 확인할 수 있다.

{
  "took" : 132,
  "errors" : false,
  "items" : [ 
    {
      "index" : {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    ...
    {
      "index" : {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "10",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 9,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}

JSON 파일을 이용해서도 벌크 API 를 사용할 수 있다.

# user_bulk_file.json
{ "index" : { "_index" : "user_bulk", "_id" : "1" } }
{ "first_name" : "James", "last_name": "Butt", "birth": 1970, "country": "New Orleans", "zip": "70116" }
{ "index" : { "_index" : "user_bulk", "_id" : "2" } }
{ "first_name" : "Josephine", "last_name": "Darakjy", "birth": 1967 , "country": "Livingston", "zip": "48116" }
{ "index" : { "_index" : "user_bulk", "_id" : "3" } }
{ "first_name" : "Art", "last_name": "Venere", "birth": 1977, "country": "Gloucester", "zip": "8014" }
{ "index" : { "_index" : "user_bulk", "_id" : "4" } }
{ "first_name" : "Lenna", "last_name": "Paprocki", "birth": 1991, "country": "Anchorage", "zip": "99501" }
{ "index" : { "_index" : "user_bulk", "_id" : "5" } }
{ "first_name" : "Donette", "last_name": "Foller", "birth": 1997, "country": "Butler", "zip": "45011" }
{ "index" : { "_index" : "user_bulk", "_id" : "6" } }
{ "first_name" : "Simona", "last_name": "Morasca", "birth": 1989, "country": "Ashland", "zip": "44805" }
{ "index" : { "_index" : "user_bulk", "_id" : "7" } }
{ "first_name" : "Mitsue", "last_name": "Tollner", "birth": 1988, "country": "Cook", "zip": "60632" }
{ "index" : { "_index" : "user_bulk", "_id" : "8" } }
{ "first_name" : "Lenna", "last_name": "Newville", "birth": 2002, "country": "San Francisco", "zip": "27601" }
{ "index" : { "_index" : "user_bulk", "_id" : "9" } }
{ "first_name" : "Dean", "last_name": "Ketelsen", "birth": 1988, "country": "San Diego", "zip": "11801" }
{ "index" : { "_index" : "user_bulk", "_id" : "10" } }
{ "first_name" : "Eden", "last_name": "Jayson", "birth": 1994, "country": "Los Angeles", "zip": "91106" }

다음과 같은 파일이 있을 때 아래와 같이 사용하면 된다.

curl -H 'Content-Type: application/x-ndjson' -X POST 'localhost:9200/_bulk' --data-binary "@user_bulk_file.json"

데이터 조회

데이터 조회는 URI 방식과 리퀘스트바디 방식이 있다.

조회는 Index에 저장된 Document 를 대상으로 조회할 수 있는데, 여러 Index 를 묶어서 조회 할 수 있다.

검색에 대한 자세한 정보는 여기를 참고하면 된다.

1. URI 검색
URI 검색은 http 주소에 검색할 명령을 쿼리파라미터로 포함해서 호출하는 검색이다. 리퀘스트바디 검색과 비교해서 사용방법은 간단하지만 복잡한 질의를 입력하기 어려운 단점이 있다

1.1 q 는 검색을 위한 매개변수이다. q 매개변수에 <필드명> : <질의> 형태로 입력한다.
다음은 country 속성에 Anchorage 인 Document 를 검색하는 명령이다.

curl -X GET 'localhost:9200/user_bulk/_search?pretty&q=country:Anchorage'
-- 반환값 ---
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "Lenna",
          "last_name" : "Paprocki",
          "birth" : 1991,
          "country" : "Anchorage",
          "zip" : "99501"
        }
      }
    ]
  }
}

q 에 필드명을 넣는대신 df(default field) 매개변수를 이용해서 검색할 필드를 지정할 수 있다.

curl -X GET 'localhost:9200/user_bulk/_search?pretty&q=San%20Diego&df=country'
--- 반환값 ---
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.9558086,
    "hits" : [
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 2.9558086,
        "_source" : {
          "first_name" : "Dean",
          "last_name" : "Ketelsen",
          "birth" : 1988,
          "country" : "San Diego",
          "zip" : "11801"
        }
      },
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 1.2605917,
        "_source" : {
          "first_name" : "Lenna",
          "last_name" : "Newville",
          "birth" : 2002,
          "country" : "San Francisco",
          "zip" : "27601"
        }
      }
    ]
  }
}

그런데 기대와는 다른 검색 결과가 나왔다. 기대하지 않았던 San Francisco 도 함께 검색되었다.

그 이유는 default_operator 를 지정하지 않으면 기본적으로 공백(%20)을 기준으로 OR 로 검색한다.
이를 AND로 조회하고 싶을 경우 다음과 같이 default_operator=AND 로 수정하면 된다.

curl -X GET 'localhost:9200/user_bulk/_search?pretty&q=San%20Diego&df=country&default_operator=AND'
--- 반환값 ---
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 2.9558086,
    "hits" : [
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 2.9558086,
        "_source" : {
          "first_name" : "Dean",
          "last_name" : "Ketelsen",
          "birth" : 1988,
          "country" : "San Diego",
          "zip" : "11801"
        }
      }
    ]
  }
}

1.2 _source 를 false로 설정할 경우 검색 결과에 Document 는 표시하지 않고, 전체 hit수와 점수 등의 메타정보만 표시한다.

curl -X GET 'localhost:9200/user_bulk/_search?pretty&q=country:Anchorage&_source=false'
--- 반환값 ---
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 2.2561343,
    "hits" : [
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 2.2561343
      }
    ]
  }
}

1.3 sort 를 이용해서 검색 결과의 순서를 조정할 수 있다. 기본적으로 검색 결과는 점수 값 기준으로 정렬된다.

정렬할 기준을 변경하려면 다음과 같이 사용하면 된다. 기본적으로 오름차순으로 정렬한다.

curl -X GET 'localhost:9200/user_bulk/_search?pretty&sort=birth'
--- 반환값 ---
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "first_name" : "Josephine",
          "last_name" : "Darakjy",
          "birth" : 1967,
          "country" : "Livingston",
          "zip" : "48116"
        },
        "sort" : [
          1967
        ]
      },
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "first_name" : "James",
          "last_name" : "Butt",
          "birth" : 1970,
          "country" : "New Orleans",
          "zip" : "70116"
        },
        "sort" : [
          1970
        ]
      },
      ...
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "first_name" : "Donette",
          "last_name" : "Foller",
          "birth" : 1997,
          "country" : "Butler",
          "zip" : "45011"
        },
        "sort" : [
          1997
        ]
      },
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : null,
        "_source" : {
          "first_name" : "Lenna",
          "last_name" : "Newville",
          "birth" : 2002,
          "country" : "San Francisco",
          "zip" : "27601"
        },
        "sort" : [
          2002
        ]
      }
    ]
  }
}

정렬을 바꾸고 싶을 경우 sort=birth:desc 로 하면 된다.

1.4 from, size 를 이용해서 몇 번째 Document 부터 몇 개까지 노출할지 결정한다. from 은 0 부터 시작한다.

curl -X GET 'localhost:9200/user_bulk/_search?pretty&sort=birth&from=1&size=2'
--- 반환값 ---
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "first_name" : "James",
          "last_name" : "Butt",
          "birth" : 1970,
          "country" : "New Orleans",
          "zip" : "70116"
        },
        "sort" : [
          1970
        ]
      },
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "first_name" : "Art",
          "last_name" : "Venere",
          "birth" : 1977,
          "country" : "Gloucester",
          "zip" : "8014"
        },
        "sort" : [
          1977
        ]
      }
    ]
  }
}

birth 기준 오름차순 정렬을 했을 때 0번째를 제외하고 2개를 가지고 왔다.

2. 리퀘스트 바디 검색

리퀘스트바디검색은 검색할 조건을 JSON 데이터 형식의 질의로 입력해서 사용한다. 리퀘스트바디 검색은 QueryDSL을 사용하고 QueryDSL 은 match, term, range 등의 질의를 사용하는데 이에 대한 자세한 내용은 다음 포스팅에서 알아볼 예정이다.

리퀘스트바디 검색의 자세한 내용은 여기에서 확인할 수 있다. 기본적인 리퀘스트바디 검색의 형태는 다음과 같다.

GET /<index>/_search
{
  "query": {
  	< parameters >
  }
}

다음과 같이 match 쿼리를 이용해서 조회할 수 있다.

curl -H 'Content-Type: application/json' -X GET 'localhost:9200/user_bulk/_search?pretty' -d '
{
  "query": {
    "match" : { "country": "Anchorage" }
  }
}'
--- 반환값 ---
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 2.2561343,
    "hits" : [
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 2.2561343,
        "_source" : {
          "first_name" : "Lenna",
          "last_name" : "Paprocki",
          "birth" : 1991,
          "country" : "Anchorage",
          "zip" : "99501"
        }
      }
    ]
  }
}

2.1 URI 검색에서 사용했었던 sort, from, size 등은 리퀘스트바디 검색에서 모두 가능하다.

curl -H 'Content-Type: application/json' -X GET 'localhost:9200/user_bulk/_search?pretty' -d '
{
  "from": 1,
  "size": 2,
  "sort":[{ "birth":"asc" }]
}'
--- 반환값 ---
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "first_name" : "James",
          "last_name" : "Butt",
          "birth" : 1970,
          "country" : "New Orleans",
          "zip" : "70116"
        },
        "sort" : [
          1970
        ]
      },
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "first_name" : "Art",
          "last_name" : "Venere",
          "birth" : 1977,
          "country" : "Gloucester",
          "zip" : "8014"
        },
        "sort" : [
          1977
        ]
      }
    ]
  }
}

2.2 highlight 를 이용하면 검색 결과에 검색조건에 해당하는 부분을 강조해서 표시할 수 있다.

curl -H 'Content-Type: application/json' -X GET 'localhost:9200/user_bulk/_search?pretty' -d '
{
  "query": {
    "match" : { "country": "San" }
  },
  "highlight" : {
    "fields": { "country" : {} }
  }
}'
--- 반환값 ---
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.2605917,
    "hits" : [
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 1.2605917,
        "_source" : {
          "first_name" : "Lenna",
          "last_name" : "Newville",
          "birth" : 2002,
          "country" : "San Francisco",
          "zip" : "27601"
        },
        "highlight" : {
          "country" : [
            "<em>San</em> Francisco"
          ]
        }
      },
      {
        "_index" : "user_bulk",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 1.2605917,
        "_source" : {
          "first_name" : "Dean",
          "last_name" : "Ketelsen",
          "birth" : 1988,
          "country" : "San Diego",
          "zip" : "11801"
        },
        "highlight" : {
          "country" : [
            "<em>San</em> Diego"
          ]
        }
      }
    ]
  }
}

오늘은 여기까지~

누군가에게 도움이 되었길 바라면서 오늘의 포스팅 끝~

'DB' 카테고리의 다른 글

[Elasticsearch] 엘라스틱서치 벼락치기(4) - Aggregation (0)	2020.05.20
[Elasticsearch] 엘라스틱서치 벼락치기(3) - QueryDSL (0)	2020.05.05
[Elasticsearch] 엘라스틱서치 벼락치기(1) - 기본개념 (1)	2020.04.15
[MySQL]MySQL 벼락치기(13) - 쿼리최적화(4) (0)	2018.12.02
[MySQL]MySQL 벼락치기(12) - 쿼리최적화(3) (0)	2018.11.04

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

글 보관함

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

끄적끄적 낙서장

티스토리 뷰

[Elasticsearch] 엘라스틱서치 벼락치기(2) - 데이터처리

'DB' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역