Skip to content

Commit

Permalink
Fix kfp pipelines testing in github workflow. (#611)
Browse files Browse the repository at this point in the history
* Fix kfp pipelines testing in github workflow.

Signed-off-by: Revital Sur <[email protected]>

* Minor change.

Signed-off-by: Revital Sur <[email protected]>

* Address review comments.

Signed-off-by: Revital Sur <[email protected]>

* Minor change.

Signed-off-by: Revital Sur <[email protected]>

* Remove test-code-inputcode2parquet.yml

Signed-off-by: Revital Sur <[email protected]>

* Addres review comments.

Signed-off-by: Revital Sur <[email protected]>

---------

Signed-off-by: Revital Sur <[email protected]>
  • Loading branch information
revit13 authored Sep 30, 2024
1 parent fdabeaf commit 4b59c9b
Show file tree
Hide file tree
Showing 21 changed files with 2,111 additions and 24 deletions.
29 changes: 25 additions & 4 deletions .github/workflows/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,41 @@ CODE_TRANSFORMS=code2parquet code_quality header_cleanser malware proglang_selec
LANG_TRANSFORMS=doc_chunk doc_quality lang_id pdf2parquet pii_redactor text_encoder


# A list that holds transforms that should not be tested with KFP
KFP_BLACK_LIST="doc_chunk,pdf2parquet,pii_redactor"

transform-tests:
$(MAKE) TRANSFORM_SUBDIR=universal .transform-tests
$(MAKE) TRANSFORM_SUBDIR=language .transform-tests
$(MAKE) TRANSFORM_SUBDIR=code .transform-tests
$(MAKE) TRANSFORM_SUBDIR=universal .transform-tests
$(MAKE) TRANSFORM_SUBDIR=universal .transform-kfp-tests
$(MAKE) TRANSFORM_SUBDIR=language .transform-tests
$(MAKE) TRANSFORM_SUBDIR=language .transform-kfp-tests
$(MAKE) TRANSFORM_SUBDIR=code .transform-tests
$(MAKE) TRANSFORM_SUBDIR=code .transform-kfp-tests

# Expects
# TRANSFORM_SUBDIR transforms subdirectory (such as universal)
.transform-tests:
@for i in $$(find ../../transforms/$(TRANSFORM_SUBDIR) -depth 1 -type d); do \
@for i in $$(find ../../transforms/$(TRANSFORM_SUBDIR) -mindepth 1 -maxdepth 1 -type d); do \
dir=$$(basename $$i); \
yml=test-$(TRANSFORM_SUBDIR)-$$dir.yml; \
echo Generating $$yml; \
cat test-transform.template | sed -e "s?@TARGET_TRANSFORM_DIR@?transforms/$${TRANSFORM_SUBDIR}/$$dir?g" > $$yml; \
done

.transform-kfp-tests:
@for i in $$(find ../../transforms/$(TRANSFORM_SUBDIR) -mindepth 1 -maxdepth 1 -type d); do \
dir=$$(basename $$i); \
z=$$(echo ${KFP_BLACK_LIST} | grep -v $$dir); \
if [ ! -d ../../transforms/$(TRANSFORM_SUBDIR)/$$dir/kfp_ray ] || [ -z "$$z" ]; then \
continue; \
fi; \
yml=test-$(TRANSFORM_SUBDIR)-$$dir-kfp.yml; \
echo Generating $$yml; \
cat test-kfp-transform.template | sed -e "s?@TARGET_TRANSFORM_DIR@?transforms/$${TRANSFORM_SUBDIR}/$$dir?g" > $$yml; \
done






37 changes: 17 additions & 20 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,22 @@
# Workflow Management

Here we have the start of a system to automatically generated github workflows (currently only for transforms).
Here we have the start of a system to automatically generated github workflows.
In general, the design is to use templates and `make` to generate/update the workflows.

#### Goals
1. Run only tests for a given transform when only the transform changes.
Includes python, ray, spark and kfp_ray as available.
2. When the core dpk lib components files changes, test all transforms
3. When the shared kfp components changes, test a randomly selected transform test
(We would like to avoid running all transform kfp tests in one PR)
4. Extra credit: If .md or other non-code changes are made, run no tests.
3. When the shared kfp components changes or core dpk lib components files changes,
test a randomly selected transform test. Otherwise run kfp test for the changed transforms.

#### Assumptions
1. All transforms will have test workflows. A transform can disable its tests locally
(temporarily?) by renaming its Makefile. For example,
`cp transforms/universal/noop/Makefile transforms/universal/noop/Makefile.disabled`.
A github action for kfp testing will not be generated if it appears in `KFP_BLACK_LIST`
in the [Makefile](./Makefile).


## DPK libraries (`data-processing-lib` directory)
The DPK libraries, in data-processing-lib/{python,ray,spark}, are tested
Expand All @@ -26,18 +28,18 @@ The transforms test workflows also depend on this directory tree and so
changes made here will trigger transform tests.

## Transforms (`transforms` directory tree)
We define a unique test workflow for each transform, based on a common
template [test-transform.template](test-transform.template).
The [Makefile](Makefile) is used to (re)generate all workflows a necessary.
By design, workflows for a given transform should run when
We define two test workflows for each transform: one is based on a common
template [test-transform.template](test-transform.template) and the other, for kfp testing,
is based on a common template [test-kfp-transform.template](test-kfp-transform.template).
The [Makefile](Makefile) is used to (re)generate all workflows as necessary.
By design, non kfp workflows for a given transform should run when

* anything of substance effecting operation is modified in the transform's directory tree.
* anything in the core libraries in this repo (e.g., data-processing/lib) assuming the transform depends on these.

Note that the kfp tests (in kfp_ray/Makefile workflow-test) for a given transform are
**not** currently being run when the transform's tests are run.
Currently these are run randomly via the [test-kfp.yml](test-kfp.yml).
We expect to fix this is in the future.
The generated kfp workflows should run when anything of substance effecting operation is modified in the transform's directory tree
and non of the core libraries in this repo nor the kfp components were changed.
Otherwise a randomly chosen transform will undergo KFP testing, triggered by the [test-kfp.yml](test-kfp.yml) workflow.

When a new transform is added to the repository,

Expand All @@ -58,16 +60,11 @@ git push --set-upstream origin new-branch

Like DPK core libs, kfp tests are defined in
[test-kfp.yml](test-kfp.yml) and run whenever changes are made in
the `kfp` directory tree. Tests currently include

1. test kfp on randomly selected transform.

Eventually we would like to enable the transform-specific kfp test
when only the transform code is modified or maybe when only
the `kfp_ray` directory contents is modified.
the `kfp` directory tree as well as in the DPK core libs. Tests currently include
test kfp on randomly selected transform.

## Miscellaneous
[test-misc.yml](test-misc.yml) defines some repo consistency tests including

1. Make sure `set-versions` make target can be run recursively throughout the repo
2. Makes sure there is a test workflow for each transform in the repo.
2. Makes sure there is a test workflow for each transform in the repo.
114 changes: 114 additions & 0 deletions .github/workflows/test-code-code2parquet-kfp.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
#
# DO NOT EDIT THIS FILE: it is generated from test-transform.template, Edit there and run make to change these files
#
name: Test KFP - transforms/code/code2parquet

on:
workflow_dispatch:
push:
branches:
- "dev"
- "releases/**"
tags:
- "*"
paths:
- "transforms/code/code2parquet/**"
- "!kfp/**" # This is tested in separate workflow
- "!data-processing-lib/**" # This is tested in separate workflow
- "!**.md"
- "!**/doc/**"
- "!**/images/**"
- "!**.gitignore"
pull_request:
branches:
- "dev"
- "releases/**"
paths:
- "transforms/code/code2parquet/**"
- "!data-processing-lib/**" # This is tested in separate workflow
- "!kfp/**" # This is tested in separate workflow
- "!**.md"
- "!**/doc/**"
- "!**/images/**"
- "!**.gitignore"


jobs:
test-kfp-v1:
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Free up space in github runner
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
run: |
df -h
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/lib/jvm /usr/local/.ghcup
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true
df -h
- name: Test KFP libs (shared and v1) and run a workflow
timeout-minutes: 120
run: |
export REPOROOT=$PWD
export K8S_SETUP_SCRIPTS=$PWD/scripts/k8s-setup
source $K8S_SETUP_SCRIPTS/requirements.env
export PATH=$PATH:/tmp/
curl -Lo /tmp/kind https://kind.sigs.k8s.io/dl/v${KIND_VERSION}/kind-linux-amd64
chmod 777 /tmp/kind
curl -fsSL -o /tmp/get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 /tmp/get_helm.sh
HELM_INSTALL_DIR=/tmp/ /tmp/get_helm.sh -v v${HELM_VERSION} --no-sudo
chmod 777 /tmp/helm
curl -L https://dl.k8s.io/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl -o /tmp/kubectl
chmod 777 /tmp/kubectl
curl https://dl.min.io/client/mc/release/linux-amd64/mc --create-dirs -o /tmp/mc
chmod +x /tmp/mc
export DEPLOY_KUBEFLOW=1
make -C $K8S_SETUP_SCRIPTS setup
make -C kfp/kfp_support_lib test
make -C transforms workflow-build
source $K8S_SETUP_SCRIPTS/common.sh
make -C transforms/code/code2parquet workflow-test
echo "Run transforms/code/code2parquet completed"
test-kfp-v2:
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Free up space in github runner
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
run: |
df -h
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/lib/jvm /usr/local/.ghcup
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true
df -h
- name: Test KFP libs (shared and v2) and run a workflow
timeout-minutes: 120
run: |
export REPOROOT=$PWD
export K8S_SETUP_SCRIPTS=$PWD/scripts/k8s-setup
source $K8S_SETUP_SCRIPTS/requirements.env
export PATH=$PATH:/tmp/
curl -Lo /tmp/kind https://kind.sigs.k8s.io/dl/v${KIND_VERSION}/kind-linux-amd64
chmod 777 /tmp/kind
curl -fsSL -o /tmp/get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 /tmp/get_helm.sh
HELM_INSTALL_DIR=/tmp/ /tmp/get_helm.sh -v v${HELM_VERSION} --no-sudo
chmod 777 /tmp/helm
curl -L https://dl.k8s.io/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl -o /tmp/kubectl
chmod 777 /tmp/kubectl
curl https://dl.min.io/client/mc/release/linux-amd64/mc --create-dirs -o /tmp/mc
chmod +x /tmp/mc
export DEPLOY_KUBEFLOW=1
export KFPv2=1
make -C $K8S_SETUP_SCRIPTS setup
make -C kfp/kfp_support_lib test
make -C transforms workflow-build
source $K8S_SETUP_SCRIPTS/common.sh
make -C transforms/code/code2parquet workflow-test
header_text "Run transforms/code/code2parquet completed"
114 changes: 114 additions & 0 deletions .github/workflows/test-code-code_quality-kfp.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
#
# DO NOT EDIT THIS FILE: it is generated from test-transform.template, Edit there and run make to change these files
#
name: Test KFP - transforms/code/code_quality

on:
workflow_dispatch:
push:
branches:
- "dev"
- "releases/**"
tags:
- "*"
paths:
- "transforms/code/code_quality/**"
- "!kfp/**" # This is tested in separate workflow
- "!data-processing-lib/**" # This is tested in separate workflow
- "!**.md"
- "!**/doc/**"
- "!**/images/**"
- "!**.gitignore"
pull_request:
branches:
- "dev"
- "releases/**"
paths:
- "transforms/code/code_quality/**"
- "!data-processing-lib/**" # This is tested in separate workflow
- "!kfp/**" # This is tested in separate workflow
- "!**.md"
- "!**/doc/**"
- "!**/images/**"
- "!**.gitignore"


jobs:
test-kfp-v1:
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Free up space in github runner
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
run: |
df -h
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/lib/jvm /usr/local/.ghcup
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true
df -h
- name: Test KFP libs (shared and v1) and run a workflow
timeout-minutes: 120
run: |
export REPOROOT=$PWD
export K8S_SETUP_SCRIPTS=$PWD/scripts/k8s-setup
source $K8S_SETUP_SCRIPTS/requirements.env
export PATH=$PATH:/tmp/
curl -Lo /tmp/kind https://kind.sigs.k8s.io/dl/v${KIND_VERSION}/kind-linux-amd64
chmod 777 /tmp/kind
curl -fsSL -o /tmp/get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 /tmp/get_helm.sh
HELM_INSTALL_DIR=/tmp/ /tmp/get_helm.sh -v v${HELM_VERSION} --no-sudo
chmod 777 /tmp/helm
curl -L https://dl.k8s.io/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl -o /tmp/kubectl
chmod 777 /tmp/kubectl
curl https://dl.min.io/client/mc/release/linux-amd64/mc --create-dirs -o /tmp/mc
chmod +x /tmp/mc
export DEPLOY_KUBEFLOW=1
make -C $K8S_SETUP_SCRIPTS setup
make -C kfp/kfp_support_lib test
make -C transforms workflow-build
source $K8S_SETUP_SCRIPTS/common.sh
make -C transforms/code/code_quality workflow-test
echo "Run transforms/code/code_quality completed"
test-kfp-v2:
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Free up space in github runner
# Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
run: |
df -h
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/lib/jvm /usr/local/.ghcup
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true
df -h
- name: Test KFP libs (shared and v2) and run a workflow
timeout-minutes: 120
run: |
export REPOROOT=$PWD
export K8S_SETUP_SCRIPTS=$PWD/scripts/k8s-setup
source $K8S_SETUP_SCRIPTS/requirements.env
export PATH=$PATH:/tmp/
curl -Lo /tmp/kind https://kind.sigs.k8s.io/dl/v${KIND_VERSION}/kind-linux-amd64
chmod 777 /tmp/kind
curl -fsSL -o /tmp/get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 /tmp/get_helm.sh
HELM_INSTALL_DIR=/tmp/ /tmp/get_helm.sh -v v${HELM_VERSION} --no-sudo
chmod 777 /tmp/helm
curl -L https://dl.k8s.io/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl -o /tmp/kubectl
chmod 777 /tmp/kubectl
curl https://dl.min.io/client/mc/release/linux-amd64/mc --create-dirs -o /tmp/mc
chmod +x /tmp/mc
export DEPLOY_KUBEFLOW=1
export KFPv2=1
make -C $K8S_SETUP_SCRIPTS setup
make -C kfp/kfp_support_lib test
make -C transforms workflow-build
source $K8S_SETUP_SCRIPTS/common.sh
make -C transforms/code/code_quality workflow-test
header_text "Run transforms/code/code_quality completed"
Loading

0 comments on commit 4b59c9b

Please sign in to comment.