Skip to content

Commit

Permalink
Merge pull request #422 from mabel-dev/FEATURE/#364
Browse files Browse the repository at this point in the history
Feature/#364
  • Loading branch information
joocer authored Aug 27, 2022
2 parents 225250f + 1dcca00 commit b9bbda9
Show file tree
Hide file tree
Showing 14 changed files with 153 additions and 24,630 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ __pycache__/

# C extensions
*.so
*.c

# Distribution / packaging
.Python
Expand Down Expand Up @@ -154,3 +155,4 @@ tests/data/zoned/**
tests/data/parquet/**
opteryx.yaml
data/**
opteryx/third_party/soundex.c
1 change: 1 addition & 0 deletions docs/Release Notes/Change Log.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- [[#330](https://github.com/mabel-dev/opteryx/issues/330)] Support `SIMILAR TO` alias for RegEx match. ([@joocer](https://github.com/joocer))
- [[#331](https://github.com/mabel-dev/opteryx/issues/331)] Support `SAFE_CAST` alias for `TRY_CAST`. ([@joocer](https://github.com/joocer))
- [[#419](https://github.com/mabel-dev/opteryx/issues/419)] Various simple functions (`SIGN`, `SQRT`, `TITLE`, `REVERSE`). ([@joocer](https://github.com/joocer))
- [[#364](https://github.com/mabel-dev/opteryx/issues/364)] Support `SOUNDEX` function. ([@joocer](https://github.com/joocer))

**Changed**

Expand Down
1 change: 1 addition & 0 deletions docs/Release Notes/Notices.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Component | Disposition | Copyright | Licence
[cython](https://github.com/cython/cython) | Installed | . | [Apache 2.0](https://github.com/cython/cython/blob/master/LICENSE.txt)
[datetime_truncate](https://github.com/mediapop/datetime_truncate) | Integrated | 2020 Media Pop | [MIT](https://github.com/mediapop/datetime_truncate/blob/master/LICENSE)
[distogram](https://github.com/maki-nage/distogram) | Integrated | 2020 Romain Picard | [MIT](https://github.com/maki-nage/distogram/blob/master/LICENSE.txt)
[fuzzy](https://github.com/yougov/fuzzy) | Integrated | Jason R. Coombs | [MIT](https://github.com/yougov/fuzzy/blob/master/LICENSE)
[mbleven](https://github.com/fujimotos/mbleven) | Integrated | 2018 Fujimoto Seiji | [Public Domain](https://github.com/fujimotos/mbleven/blob/master/LICENSE)
[numpy](https://github.com/numpy/numpy) | Installed | . | [BSD-3](https://github.com/numpy/numpy/blob/main/LICENSE.txt)
[orjson](https://github.com/ijl/orjson) | Installed | . | [Apache 2.0](https://github.com/ijl/orjson/blob/master/LICENSE-APACHE)
Expand Down
3 changes: 3 additions & 0 deletions docs/SQL Reference/06 Functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,9 @@ Functions for examining and manipulating string values.
!!! function "`RIGHT` (**str**: _varchar_, **n**: _numeric_) → _varchar_"
Extract the right-most **n** characters of **str**.

!!! function "`SOUNDEX` (**str**: _varchar_) → _varchar_"
Returns a character string containing the phonetic representation of char. See [Soundex 🡕](https://en.wikipedia.org/wiki/Soundex).

!!! function "`SEARCH` (**str**: _varchar_, **value**: _varchar_) → _boolean_ 🔻"
Return True if **str** contains **value**.

Expand Down
1 change: 1 addition & 0 deletions opteryx/engine/functions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,7 @@ def _raise_exception(text):
"LEFT": string_functions.string_slicer_left,
"RIGHT": string_functions.string_slicer_right,
"REVERSE": compute.utf8_reverse,
"SOUNDEX": string_functions.soundex,
"TITLE": compute.utf8_title,
# HASHING & ENCODING
"HASH": _iterate_single_parameter(lambda x: format(CityHash64(str(x)), "X")),
Expand Down
15 changes: 15 additions & 0 deletions opteryx/engine/functions/string_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@

import numpy

from opteryx.third_party.soundex import Soundex


def string_slicer_left(arr, length):
"""
Expand Down Expand Up @@ -49,3 +51,16 @@ def string_slicer_right(arr, length):
arr = arr.astype(str) # it's probably an array of objects
interim = arr.view((str, 1)).reshape(len(arr), -1)[:, -length:]
return numpy.array(interim).view((str, length)).flatten()


def soundex(arr):
_soundex = Soundex(4)
interim = ["0000"] * arr.size

for index, string in enumerate(arr):
if string:
interim[index] = _soundex(string)
else:
interim[index] = None

return numpy.array(interim, dtype=numpy.str_)
1 change: 1 addition & 0 deletions opteryx/third_party/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ These are third-party modules which we include into the Opteryx codebase.
- [**hyperloglog**](https://github.com/ekzhu/datasketch)
- [**mbleven**](https://github.com/fujimotos/mbleven)
- [**pyarrow_ops**](https://github.com/TomScheffers/pyarrow_ops)
- [**soundex**](https://github.com/yougov/fuzzy/blob/master/src/fuzzy.pyx)

These modules have been removed from the codebase

Expand Down
Loading

0 comments on commit b9bbda9

Please sign in to comment.