分类
大数据 笔记

Elasticsearch 分页

Elasticsearch提供两种分页,一种是使用from-size分页:

GET /_search
{
  "from": 5,
  "size": 20,
  "query": {
    "match_all": {}
  }
}

from:偏移量
size:每页显示条数

ES默认情况下,使用from和size参数翻页不支持超过10000个文档,这个限制是使用索引设置的,可以设置 index.max_result_window 参数。

深度分页或一次请求许多结果可能会导致搜索缓慢。结果在返回之前先进行排序。由于搜索请求通常跨越多个分片,因此每个分片必须生成自己的排序结果。然后,必须对这些单独的结果进行合并和排序,以确保总体排序顺序正确。

作为深度分页的替代方法,官方建议使用滚动分页代替。

分类
大数据

ES聚合操作 – Fielddata is disabled on text fields by default

使用ES聚合操作对字段(projectCode)去重:

GET bury-point-click/_search
{
  "size": 0,
  "aggs": {
    "group_by_project": {
      "cardinality": {
        "field": "projectCode"
      }
    }
  }
}

执行报错:Fielddata is disabled on text fields by default. Set fielddata=true on [projectCode] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.

分类
Java

Flink:ElasticsearchSinkFunction is not serializable

用Java把Flink结果数据下沉到Elasticsearch,执行时执出ElasticsearchSinkFunction is not serializable异常:

The implementation of the provided ElasticsearchSinkFunction is not serializable. The object probably contains or references non-serializable fields.

分类
Java Linux

Linux环境启动Elasticsearch错误

Elasticsearch默认启动监听的是本地127.0.0.1端口,现需要把服务发布出来,供其他机器访问。修改Elasticsearch配置文件(config/elasticsearch.yml):

network.host: 0.0.0.0
discovery.seed_hosts: ["0.0.0.0"]

启动后报错,如图:

ERROR: [1] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]