Skip to content

Commit

Permalink
Support Elasticsearch v8 (and v7) & gem v1 release (#77)
Browse files Browse the repository at this point in the history
We are still on v0 so going for a breaking change since we no longer care
about Elasticsearch < v7 and Ruby < 2.7.

But also working towards a v1 release since the gem has been somewhat
neglected, by doing some extra changes to remove hacks in other
services, e.g. deals service.

Added
- Support for Ruby 3 (and keep support for 2.7).
- Support for Elasticsearch v8 (and keep support for v7).
- Support setting a logger in `Config`.
- Support refresh on `IndexManager#populate_index`.
- Support Proc in `Config#data_source` so it can be lazily evaluated.

Removed
- Drop support for Ruby 2.6. (Previous PR.)
- Drop support for Elasticsearch v5 and v6.

The README has also been updated to reflect that the gem only supports
Elasticsearch and ActiveRecord with PG since we rely on txid which is PG
specific and we don't have a way around it.
  • Loading branch information
PChambino authored Aug 19, 2022
1 parent e598d73 commit 0c5fd05
Show file tree
Hide file tree
Showing 11 changed files with 98 additions and 72 deletions.
15 changes: 8 additions & 7 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ jobs:
- image: "elasticsearch:<< parameters.elasticsearch_version >>"
environment:
"discovery.type": single-node
"xpack.security.enabled": false
environment:
ELASTICSEARCH_GEM_VERSION: "<< parameters.elasticsearch_gem_version >>"
steps:
Expand All @@ -28,20 +29,20 @@ workflows:
test:
jobs:
- test:
name: "test with elasticsearch v7
name: "test with elasticsearch v8
and gem << matrix.elasticsearch_gem_version >>
and ruby v<< matrix.ruby_version >>"
elasticsearch_version: "7.17.5"
elasticsearch_version: "8.3.3"
matrix:
parameters:
ruby_version: ["3.1", "2.7"]
elasticsearch_gem_version: ["~> 7", "~> 5"]
elasticsearch_gem_version: ["~> 8", "~> 7"]
- test:
name: "test with elasticsearch v5
and gem ~> 5
name: "test with elasticsearch v7
and gem << matrix.elasticsearch_gem_version >>
and ruby v<< matrix.ruby_version >>"
elasticsearch_version: "5.6-alpine"
elasticsearch_gem_version: "~> 5"
elasticsearch_version: "7.17.5"
matrix:
parameters:
ruby_version: ["3.1", "2.7"]
elasticsearch_gem_version: ["~> 7"]
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
/pkg/
/spec/reports/
/tmp/
/log/*.log
/Gemfile.lock

# rspec failure tracking
Expand Down
47 changes: 38 additions & 9 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,44 @@
v0.5.1 - Aug 8th, 2018
---
# Changelog

- Fix re-indexing log output upon initial index creation (#7)
All notable changes to this project will be documented in this file.

v0.5.0 - Aug 3rd, 2018
---
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

- An _ESTIMATED_ % completion of ES re-indexing is logged, as the new index gets populated (#5)
## How do I make a good changelog?

### Guiding Principles

v0.4.0 - May 29th, 2018
---
- Changelogs are for humans, not machines.
- There should be an entry for every single version.
- The same types of changes should be grouped.
- Versions and sections should be linkable.
- The latest version comes first.
- The release date of each version is displayed.
- Keep an `Unreleased` section at the top to track upcoming changes.

- Warning rather than exception on ignorable version conflicts (#4)
### Types of changes

- `Added` for new features.
- `Changed` for changes in existing functionality.
- `Deprecated` for soon-to-be removed features.
- `Removed` for now removed features.
- `Fixed` for any bug fixes.
- `Security` in case of vulnerabilities.

## [Unreleased]

## Unreleased: [1.0.0] - 2022-08-XX
### Added
- Support for Ruby 3 (and keep support for 2.7).
- Support for Elasticsearch v8 (and keep support for v7).
- Support setting a logger in `Config`.
- Support refresh on `IndexManager#populate_index`.
- Support Proc in `Config#data_source` so it can be lazily evaluated.

### Removed
- Drop support for Ruby 2.6.
- Drop support for Elasticsearch v5 and v6.

[Unreleased]: https://github.com/carwow/zelastic/compare/v1.0.0...HEAD
[1.0.0]: https://github.com/carwow/zelastic/releases/tag/v1.0.0
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# Zero-downtime indexing from ActiveRecord->Elasticsearch
# Zelastic

Zero-downtime Elasticsearch tooling for managing indices and indexing from
ActiveRecord with PostgreSQL to Elasticsearch.

## Installation

Expand All @@ -10,13 +13,14 @@ gem 'zelastic'

And then execute:

$ bundle
$ bundle install

Or install it yourself as:

$ gem install zelastic

## Usage

### Setup

For each ActiveRecord scope you want to index, you'll need a configuration:
Expand Down Expand Up @@ -46,7 +50,6 @@ You can also override some defaults, if you wish:
here
- `read_alias`: by default this is the table name of the `data_source`
- `write_alias`: by default this is the `read_alias`, with `_write` appended
- `type`: by default this is `read_alias.singularize`

If you pass an array to as the `client` argument, all writes will be applied to every client in the
array.
Expand Down
25 changes: 13 additions & 12 deletions lib/zelastic/config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,31 +2,36 @@

module Zelastic
class Config
attr_reader :clients, :data_source
attr_reader :clients

def initialize(
client:,
data_source:,
mapping:,
logger: nil,
**overrides,
&index_data
)
@clients = Array(client)
@data_source = data_source
@mapping = mapping
@index_data = index_data
@_type = overrides.fetch(:type, true)
@logger = logger
@overrides = overrides
end

def type?
@_type
@index_data = index_data
end

def index_data(model)
@index_data.call(model)
end

def data_source
if @data_source.respond_to? :call
@data_source.call
else
@data_source
end
end

def read_alias
@read_alias ||= overrides.fetch(:read_alias) { data_source.table_name }
end
Expand All @@ -35,10 +40,6 @@ def write_alias
@write_alias ||= overrides.fetch(:write_alias) { [read_alias, 'write'].join('_') }
end

def type
@type ||= overrides.fetch(:type, read_alias.singularize)
end

def logger
return Rails.logger if defined?(Rails)

Expand All @@ -48,7 +49,7 @@ def logger
def index_definition
{
settings: overrides.fetch(:index_settings, {}),
mappings: type ? { type => mapping } : mapping
mappings: mapping
}
end

Expand Down
15 changes: 4 additions & 11 deletions lib/zelastic/index_manager.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ def create_index(unique_name)
client.indices.put_alias(index: index_name, name: config.write_alias)
end

def populate_index(unique_name = nil, batch_size: 3000)
def populate_index(unique_name = nil, batch_size: 3000, refresh: false)
index_name = index_name_from_unique(unique_name)

config.data_source.find_in_batches(batch_size: batch_size).with_index do |batch, i|
logger.info(populate_index_log(batch_size: batch_size, batch_number: i + 1))
indexer.index_batch(batch, client: client, index_name: index_name)
indexer.index_batch(batch, client: client, index_name: index_name, refresh: refresh)
end
end

Expand Down Expand Up @@ -116,18 +116,11 @@ def populate_index_log(batch_size:, batch_number:)
else
'First index'
end
"ES: (#{progress}) Indexing #{config.type} records"
"ES: (#{progress}) Indexing records"
end

def current_index_size
@current_index_size ||= client.count(**count_params)['count']
end

def count_params
{
index: config.read_alias,
type: config.type? ? config.type : nil
}.compact
@current_index_size ||= client.count(index: config.read_alias)['count']
end

def indexed_percent(batch_size, batch_number)
Expand Down
31 changes: 9 additions & 22 deletions lib/zelastic/indexer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,7 @@ def delete_by_ids(ids)

execute_bulk do |index_name|
ids.map do |id|
delete_params = { _index: index_name, _id: id }
delete_params[:_type] = config.type if config.type?

{ delete: delete_params }
{ delete: { _index: index_name, _id: id } }
end
end
end
Expand Down Expand Up @@ -76,19 +73,14 @@ def write_indices(client)
end

def index_command(index:, version:, record:)
version_params =
if config.type?
{ _version: version, _version_type: :external, _type: config.type }
else
{ version: version, version_type: :external }
end

{
index: {
_index: index,
_id: record.id,
data: config.index_data(record)
}.merge(version_params)
data: config.index_data(record),
version: version,
version_type: :external
}
}
end

Expand Down Expand Up @@ -123,16 +115,11 @@ def check_errors!(result)
end

def ignorable_error?(error)
# rubocop:disable Layout/LineLength
regexp =
if config.type?
/^\[#{config.type}\]\[\d+\]: version conflict, current version \[\d+\] is higher or equal to the one provided \[\d+\]$/
else
/^\[\d+\]: version conflict, current version \[\d+\] is higher or equal to the one provided \[\d+\]$/
end
# rubocop:enable Layout/LineLength
error['type'] == 'version_conflict_engine_exception' &&
error['reason'] =~ regexp
error['reason'] =~ VERSION_CONFLICT_ERROR_REGEXP
end

VERSION_CONFLICT_ERROR_REGEXP =
/^\[\d+\]: version conflict, current version \[\d+\] is higher or equal to the one provided \[\d+\]$/.freeze
end
end
Empty file added log/.keep
Empty file.
4 changes: 4 additions & 0 deletions spec/spec_helper.rb
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
# frozen_string_literal: true

require 'bundler/setup'

require 'active_record'
require 'active_support'
require 'active_support/core_ext'
require 'elasticsearch'
require 'ostruct'
require 'pry'
require 'securerandom'

require 'zelastic'

RSpec.configure do |config|
Expand Down
18 changes: 12 additions & 6 deletions spec/zelastic/indexer_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,30 @@
require 'spec_helper'

RSpec.describe Zelastic::Indexer do
let(:type) { Gem::Version.new(client.info.dig('version', 'number')) <= Gem::Version.new('7.0.0') }
let(:config) do
Zelastic::Config.new(
client: client,
data_source: data_source,
mapping: mapping,
type: type
logger: Logger.new('log/test.log')
) { |_| {} }
end

let(:client) do
Elasticsearch::Client.new(url: ENV.fetch('ELASTICSEARCH_URL', 'http://localhost:9200'))
Elasticsearch::Client.new(
url: ENV.fetch('ELASTICSEARCH_URL', 'http://localhost:9200')
)
end
let(:mapping) { { properties: {} } }
let(:data_source) do
db_conn = double(:db_conn, select_one: { 'xmax' => @xmax })
double(:data_source, table_name: 'table_name', connection: db_conn)
lambda do # verifies lazy eval of data source
class_double(
ActiveRecord::Base,
table_name: 'table_name',
connection: double(:connection, select_one: { 'xmax' => @xmax })
)
end
end
let(:mapping) { { properties: {} } }
let(:index_id) { SecureRandom.hex(3) }

before do
Expand Down
5 changes: 3 additions & 2 deletions zelastic.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,13 @@ Gem::Specification.new do |spec|
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
spec.require_paths = ['lib']

spec.add_dependency 'elasticsearch', '>= 7', '< 9'

spec.add_dependency 'activerecord'
spec.add_dependency 'activesupport'

spec.add_development_dependency 'activerecord'
spec.add_development_dependency 'bundler', '~> 2'
spec.add_development_dependency 'carwow_rubocop', '~> 4'
spec.add_development_dependency 'elasticsearch', '>= 5', '< 8'
spec.add_development_dependency 'pry', '~> 0.14'
spec.add_development_dependency 'rake', '~> 13'
spec.add_development_dependency 'rspec', '~> 3'
Expand Down

0 comments on commit 0c5fd05

Please sign in to comment.