Skip to content

Commit

Permalink
Merge pull request #880 from IBM/pending-version-change/0.2.4
Browse files Browse the repository at this point in the history
Cut off new release for 0.2.3
  • Loading branch information
touma-I authored Dec 17, 2024
2 parents c6a227b + ed6daf5 commit d31f05b
Show file tree
Hide file tree
Showing 145 changed files with 266 additions and 175 deletions.
4 changes: 2 additions & 2 deletions .make.versions
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ DPK_MINOR_VERSION=2
DPK_MICRO_VERSION=3
# The suffix is generally always set in the main/development branch and only nulled out when creating release branches.
# It can be manually incremented, for example, to allow publishing a new intermediate version wheel to pypi.
DPK_VERSION_SUFFIX=.dev2
DPK_VERSION_SUFFIX=

DPK_VERSION=$(DPK_MAJOR_VERSION).$(DPK_MINOR_VERSION).$(DPK_MICRO_VERSION)$(DPK_VERSION_SUFFIX)

Expand Down Expand Up @@ -66,4 +66,4 @@ endif
#
# If you change the versions numbers, be sure to run "make set-versions" to
# update version numbers across the transform (e.g., pyproject.toml).
TRANSFORMS_PKG_VERSION=0.2.3.dev3
TRANSFORMS_PKG_VERSION=0.2.3
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,8 @@ conda install gxx_linux-64
Next, install the data prep toolkit library. This library installs both the python and ray versions of the transforms. For better management of dependencies, it is recommended to install the same tagged version of both the library and the transform.

```bash
pip3 install 'data-prep-toolkit[ray]==0.2.3.dev0'
pip3 install 'data-prep-toolkit-transforms[ray,all]==0.2.3.dev1'
pip3 install 'data-prep-toolkit[ray]==0.2.3'
pip3 install 'data-prep-toolkit-transforms[all]==0.2.3'
pip3 install jupyterlab ipykernel ipywidgets

## install custom kernel
Expand Down
2 changes: 1 addition & 1 deletion data-processing-lib/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "data_prep_toolkit"
version = "0.2.3.dev2"
version = "0.2.3"
keywords = ["data", "data preprocessing", "data preparation", "llm", "generative", "ai", "fine-tuning", "llmapps" ]
requires-python = ">=3.10,<3.13"
description = "Data Preparation Toolkit Library for Ray and Python"
Expand Down
3 changes: 3 additions & 0 deletions kfp/kfp_ray_components/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ COPY ./src /pipelines/component/src
# Set environment
ENV KFP_v2=$KFP_v2

# Grant non-root users the necessary permissions to the ray directory
RUN chmod 755 /home/ray

# Put these at the end since they seem to upset the docker cache.
ARG BUILD_DATE
ARG GIT_COMMIT
Expand Down
2 changes: 1 addition & 1 deletion kfp/kfp_ray_components/createRayClusterComponent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ inputs:

implementation:
container:
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"
# command is a list of strings (command-line arguments).
# The YAML language has two syntaxes for lists and you can use either of them.
# Here we use the "flow syntax" - comma-separated strings inside square brackets.
Expand Down
2 changes: 1 addition & 1 deletion kfp/kfp_ray_components/deleteRayClusterComponent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ inputs:

implementation:
container:
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"
# command is a list of strings (command-line arguments).
# The YAML language has two syntaxes for lists and you can use either of them.
# Here we use the "flow syntax" - comma-separated strings inside square brackets.
Expand Down
2 changes: 1 addition & 1 deletion kfp/kfp_ray_components/executeRayJobComponent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ inputs:

implementation:
container:
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"
# command is a list of strings (command-line arguments).
# The YAML language has two syntaxes for lists and you can use either of them.
# Here we use the "flow syntax" - comma-separated strings inside square brackets.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ inputs:

implementation:
container:
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"
# command is a list of strings (command-line arguments).
# The YAML language has two syntaxes for lists and you can use either of them.
# Here we use the "flow syntax" - comma-separated strings inside square brackets.
Expand Down
2 changes: 1 addition & 1 deletion kfp/kfp_ray_components/executeSubWorkflowComponent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ outputs:

implementation:
container:
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"
# command is a list of strings (command-line arguments).
# The YAML language has two syntaxes for lists, and you can use either of them.
# Here we use the "flow syntax" - comma-separated strings inside square brackets.
Expand Down
4 changes: 2 additions & 2 deletions kfp/kfp_support_lib/kfp_v1_workflow_support/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "data_prep_toolkit_kfp_v1"
version = "0.2.3.dev2"
version = "0.2.3"
requires-python = ">=3.10,<3.13"
description = "Data Preparation Kit Library. KFP support"
license = {text = "Apache-2.0"}
Expand All @@ -13,7 +13,7 @@ authors = [
]
dependencies = [
"kfp==1.8.22",
"data-prep-toolkit-kfp-shared==0.2.3.dev2",
"data-prep-toolkit-kfp-shared==0.2.3",
]

[build-system]
Expand Down
4 changes: 2 additions & 2 deletions kfp/kfp_support_lib/kfp_v2_workflow_support/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "data_prep_toolkit_kfp_v2"
version = "0.2.3.dev2"
version = "0.2.3"
requires-python = ">=3.10,<3.13"
description = "Data Preparation Kit Library. KFP support"
license = {text = "Apache-2.0"}
Expand All @@ -14,7 +14,7 @@ authors = [
dependencies = [
"kfp==2.8.0",
"kfp-kubernetes==1.2.0",
"data-prep-toolkit-kfp-shared==0.2.3.dev2",
"data-prep-toolkit-kfp-shared==0.2.3",
]

[build-system]
Expand Down
4 changes: 2 additions & 2 deletions kfp/kfp_support_lib/shared_workflow_support/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "data_prep_toolkit_kfp_shared"
version = "0.2.3.dev2"
version = "0.2.3"
requires-python = ">=3.10,<3.13"
description = "Data Preparation Kit Library. KFP support"
license = {text = "Apache-2.0"}
Expand All @@ -14,7 +14,7 @@ authors = [
dependencies = [
"requests",
"kubernetes",
"data-prep-toolkit[ray]>=0.2.3.dev2",
"data-prep-toolkit[ray]>=0.2.3",
]

[build-system]
Expand Down
16 changes: 16 additions & 0 deletions release-notes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,21 @@
# Data Prep Kit Release notes

## Release 0.2.3 - 12/15/2024

## General

New algorithm for Fuzzy dedup transform
Sample notebooks for some of the language transforms
Integrate Semantic profiler and report generation for code profiler transform

### data-prep-toolkit libraries (python, ray, spark)

1. Increase ray agent limit to 10,000 (default was 100)

### Transforms

1. Fuzzy dedup new algorithm for Python, Ray and Spark

## Release 0.2.2 - 11/25/2024

### General
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code2parquet/kfp_ray/code2parquet_wf.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@


# components
base_kfp_image = "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"
base_kfp_image = "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"

# path to kfp component specifications files
component_spec_path = "../../../../kfp/kfp_ray_components/"
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code2parquet/python/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dpk_code2parquet_transform_python"
version = "0.2.3.dev2"
version = "0.2.3"
requires-python = ">=3.10,<3.13"
description = "code2parquet Python Transform"
license = {text = "Apache-2.0"}
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code2parquet/python/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
data-prep-toolkit>=0.2.3.dev2
data-prep-toolkit>=0.2.3
parameterized
pandas
3 changes: 3 additions & 0 deletions transforms/code/code2parquet/ray/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,9 @@ COPY src/code2parquet_local_ray.py local/
COPY test/ test/
COPY test-data/ test-data/

# Grant non-root users the necessary permissions to the ray directory
RUN chmod 755 /home/ray

# Set environment
ENV PYTHONPATH /home/ray

Expand Down
6 changes: 3 additions & 3 deletions transforms/code/code2parquet/ray/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dpk_code2parquet_transform_ray"
version = "0.2.3.dev2"
version = "0.2.3"
requires-python = ">=3.10,<3.13"
description = "code2parquet Ray Transform"
license = {text = "Apache-2.0"}
Expand All @@ -10,8 +10,8 @@ authors = [
{ name = "Boris Lublinsky", email = "[email protected]" },
]
dependencies = [
"data-prep-toolkit[ray]>=0.2.3.dev2",
"dpk-code2parquet-transform-python==0.2.3.dev2",
"data-prep-toolkit[ray]>=0.2.3",
"dpk-code2parquet-transform-python==0.2.3",
"parameterized",
"pandas",
]
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code_profiler/python/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dpk_code_profiler_transform_python"
version = "0.2.3.dev2"
version = "0.2.3"
requires-python = ">=3.10,<3.13"
description = "Code Profiler Python Transform"
license = {text = "Apache-2.0"}
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code_profiler/python/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
data-prep-toolkit>=0.2.3.dev2
data-prep-toolkit>=0.2.3
parameterized
pandas
aiolimiter==1.1.0
Expand Down
3 changes: 3 additions & 0 deletions transforms/code/code_profiler/ray/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ COPY ./src/code_profiler_local_ray.py local/
COPY test/ test/
COPY test-data/ test-data/

# Grant non-root users the necessary permissions to the ray directory
RUN chmod 755 /home/ray

# Set environment
ENV PYTHONPATH /home/ray

Expand Down
6 changes: 3 additions & 3 deletions transforms/code/code_profiler/ray/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dpk_code_profiler_transform_ray"
version = "0.2.3.dev2"
version = "0.2.3"
requires-python = ">=3.10,<3.13"
description = "Code Profiler Ray Transform"
license = {text = "Apache-2.0"}
Expand All @@ -9,8 +9,8 @@ authors = [
{ name = "Pankaj Thorat", email = "[email protected]" },
]
dependencies = [
"dpk-code-profiler-transform-python==0.2.3.dev2",
"data-prep-toolkit[ray]>=0.2.3.dev2",
"dpk-code-profiler-transform-python==0.2.3",
"data-prep-toolkit[ray]>=0.2.3",
]

[build-system]
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code_quality/kfp_ray/code_quality_wf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
task_image = "quay.io/dataprep1/data-prep-kit/code_quality-ray:latest"

# components
base_kfp_image = "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"
base_kfp_image = "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"

# path to kfp component specifications files
component_spec_path = "../../../../kfp/kfp_ray_components/"
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code_quality/python/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dpk_code_quality_transform_python"
version = "0.2.3.dev2"
version = "0.2.3"
requires-python = ">=3.10,<3.13"
description = "Code Quality Python Transform"
license = {text = "Apache-2.0"}
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code_quality/python/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
data-prep-toolkit>=0.2.3.dev2
data-prep-toolkit>=0.2.3
bs4==0.0.2
transformers==4.38.2
3 changes: 3 additions & 0 deletions transforms/code/code_quality/ray/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ COPY ./src/code_quality_local_ray.py local/
COPY test/ test/
COPY test-data/ test-data/

# Grant non-root users the necessary permissions to the ray directory
RUN chmod 755 /home/ray

# Set environment
ENV PYTHONPATH /home/ray

Expand Down
6 changes: 3 additions & 3 deletions transforms/code/code_quality/ray/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dpk_code_quality_transform_ray"
version = "0.2.3.dev2"
version = "0.2.3"
requires-python = ">=3.10,<3.13"
description = "Code Quality Ray Transform"
license = {text = "Apache-2.0"}
Expand All @@ -9,8 +9,8 @@ authors = [
{ name = "Shivdeep Singh", email = "[email protected]" },
]
dependencies = [
"dpk-code-quality-transform-python==0.2.3.dev2",
"data-prep-toolkit[ray]>=0.2.3.dev2",
"dpk-code-quality-transform-python==0.2.3",
"data-prep-toolkit[ray]>=0.2.3",
]

[build-system]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
task_image = "quay.io/dataprep1/data-prep-kit/header_cleanser-ray:latest"

# components
base_kfp_image = "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"
base_kfp_image = "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"

# path to kfp component specifications files
component_spec_path = "../../../../kfp/kfp_ray_components/"
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/header_cleanser/python/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dpk_header_cleanser_transform_python"
version = "0.2.3.dev2"
version = "0.2.3"
requires-python = ">=3.10,<3.13"
description = "License and Copyright Removal Transform for Python"
license = {text = "Apache-2.0"}
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/header_cleanser/python/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
data-prep-toolkit>=0.2.3.dev2
data-prep-toolkit>=0.2.3
scancode-toolkit==32.1.0 ; platform_system != 'Darwin'

3 changes: 3 additions & 0 deletions transforms/code/header_cleanser/ray/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,9 @@ COPY src/header_cleanser_local_ray.py local/
COPY test/ test/
COPY test-data/ test-data/

# Grant non-root users the necessary permissions to the ray directory
RUN chmod 755 /home/ray

# Set environment
ENV PYTHONPATH /home/ray

Expand Down
6 changes: 3 additions & 3 deletions transforms/code/header_cleanser/ray/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dpk_header_cleanser_transform_ray"
version = "0.2.3.dev2"
version = "0.2.3"
requires-python = ">=3.10,<3.13"
description = "License and copyright removal Transform for Ray"
license = {text = "Apache-2.0"}
Expand All @@ -9,8 +9,8 @@ authors = [
{ name = "Yash kalathiya", email = "[email protected]" },
]
dependencies = [
"dpk-header-cleanser-transform-python==0.2.3.dev2",
"data-prep-toolkit[ray]>=0.2.3.dev2",
"dpk-header-cleanser-transform-python==0.2.3",
"data-prep-toolkit[ray]>=0.2.3",
"scancode-toolkit==32.1.0",
]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@


# components
base_kfp_image = "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"
base_kfp_image = "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"

# path to kfp component specifications files
component_spec_path = "../../../../kfp/kfp_ray_components/"
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/license_select/python/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dpk_license_select_transform_python"
version = "0.2.3.dev2"
version = "0.2.3"
requires-python = ">=3.10,<3.13"
description = "License Select Python Transform"
license = {text = "Apache-2.0"}
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/license_select/python/requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
data-prep-toolkit>=0.2.3.dev2
data-prep-toolkit>=0.2.3
3 changes: 3 additions & 0 deletions transforms/code/license_select/ray/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ COPY src/license_select_local_ray.py local/
COPY test/ test/
COPY test-data/ test-data/

# Grant non-root users the necessary permissions to the ray directory
RUN chmod 755 /home/ray

# Put these at the end since they seem to upset the docker cache.
ARG BUILD_DATE
ARG GIT_COMMIT
Expand Down
Loading

0 comments on commit d31f05b

Please sign in to comment.