Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the commands unlink and deactivate-jobs #62

Merged
merged 1 commit into from
Jan 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 62 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,10 @@ With this package's approach, people don't need to learn another tool and can co

### Installation

This package uses `poetry` for dependency management.
In the near future the package might be added to PyPi but for now the installation is manual, as follows:
- Create a Python virtual environment and activate it
- Run `pip install git+https://github.com/dbt-labs/dbt-jobs-as-code.git`

1. clone this repository
2. run `poetry install`
3. run `poetry run dbt-jobs-as-code` to see the different list of commands available
The CLI is now available as `dbt-jobs-as-code`

### Pre-requisites

Expand All @@ -43,31 +41,72 @@ The following environment variables are used to run the code:

The CLI comes with a few different commands

- `poetry run python src/main.py validate <config_file.yml>`: validates that the YAML file has the correct structure
- it is possible to run the validation offline, without doing any API call
- or online using `--online`, in order to check that the different IDs provided are correct
- `poetry run python src/main.py plan <config_file.yml>`: returns the list of actions create/update/delete that are required to have dbt Cloud reflecting the configuration file
- this command doesn't modify the dbt Cloud jobs
- `poetry run python src/main.py sync <config_file.yml>`: create/update/delete jobs and env vars overwrites in jobs to align dbt Cloud with the configuration file
- ⚠️ this command will modify your dbt Cloud jobs if the current configuration is different from the YAML file
- `poetry run python src/main.py import-jobs --config <config_file.yml>` or `poetry run python src/main.py import-jobs --account-id <account-id>`: Queries dbt Cloud and provide the YAML definition for those jobs. It includes the env var overwrite at the job level if some have been defined
- it is possible to restrict the list of dbt Cloud Job IDs by adding `... -j 101 -j 123 -j 234`
- once the YAML has been retrieved, it is possible to copy/paste it in a local YAML file to create/update the local jobs definition.
- to move some ui-jobs to jobs-as-code, perform the following steps:
- run the command to import the jobs
- copy paste the job/jobs into a YAML file
- change the `import_` id of the job in the YML file to another unique identifier
- rename the job in the UI to end with `[[new_job_identifier]]`
- run a `plan` command to verify that no changes are required for the given job
#### `validate`

Command: `dbt-jobs-as-code validate <config_file.yml>`

Validates that the YAML file has the correct structure

- it is possible to run the validation offline, without doing any API call
- or online using `--online`, in order to check that the different IDs provided are correct

#### `plan`

Command: `dbt-jobs-as-code plan <config_file.yml>`

Returns the list of actions create/update/delete that are required to have dbt Cloud reflecting the configuration file

- this command doesn't modify the dbt Cloud jobs

#### `sync`

Command: `dbt-jobs-as-code sync <config_file.yml>`

Create/update/delete jobs and env vars overwrites in jobs to align dbt Cloud with the configuration file

- ⚠️ this command will modify your dbt Cloud jobs if the current configuration is different from the YAML file

#### `import-jobs`

Command: `dbt-jobs-as-code import-jobs --config <config_file.yml>` or `dbt-jobs-as-code import-jobs --account-id <account-id>`

Queries dbt Cloud and provide the YAML definition for those jobs. It includes the env var overwrite at the job level if some have been defined

- it is possible to restrict the list of dbt Cloud Job IDs by adding `... -j 101 -j 123 -j 234`
- once the YAML has been retrieved, it is possible to copy/paste it in a local YAML file to create/update the local jobs definition.

To move some ui-jobs to jobs-as-code, perform the following steps:

- run the command to import the jobs
- copy paste the job/jobs into a YAML file
- change the `import_` id of the job in the YML file to another unique identifier
- rename the job in the UI to end with `[[new_job_identifier]]`
- run a `plan` command to verify that no changes are required for the given job

#### `unlink`

Command: `dbt-jobs-as-code unlink --config <config_file.yml>` or `dbt-jobs-as-code unlink --account-id <account-id>`

Unlinking jobs removes the `[[ ... ]]` part of the job name in dbt Cloud.

⚠️ This can't be rolled back by the tool. Doing a `unlink` followed by a `sync` will create new instances of the jobs, with the `[[<identifier>]]` part

- it is possible to restrict the list of jobs to unlink by adding the job identifiers to unlink `... -i import_1 -i my_job_2`

#### `deactivate-jobs`

Command: `dbt-jobs-as-code deactivate-jobs --account-id 1234 --job-id 12 --job-id 34 --job-id 56`

This command can be used to deactivate both the schedule and the CI triggers for dbt Cloud jobs. This can be useful when moving jobs from one project to another. When the new jobs have been created, this command can be used to deactivate the jobs from the old project.

### Job Configuration YAML Schema

The file `src/schemas/load_job_schema.json` is a JSON Schema file that can be used to verify that the YAML config files syntax is correct.

To use it in VSCode, install the extension `YAML` and add the following line at the top of your YAML config file (change the path if need be):
To use it in VSCode, install [the extension `YAML`](https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml) and add the following line at the top of your YAML config file (change the path if need be):

```yaml
# yaml-language-server: $schema=../src/schemas/load_job_schema.json
# yaml-language-server: $schema=https://raw.githubusercontent.com/dbt-labs/dbt-jobs-as-code/main/src/schemas/load_job_schema.json
```

## Running the tool as part of CI/CD
Expand Down
5 changes: 2 additions & 3 deletions src/client/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
CustomEnvironmentVariablePayload,
)
from src.schemas.job import JobDefinition
from src.schemas import check_env_var_same


class DBTCloud:
Expand Down Expand Up @@ -138,7 +137,7 @@ def get_jobs(self) -> List[JobDefinition]:

return [JobDefinition(**job) for job in jobs]

def get_job(self, job_id: int) -> Dict:
def get_job(self, job_id: int) -> JobDefinition:
"""Generate a Job based on a dbt Cloud job."""

self._check_for_creds()
Expand All @@ -150,7 +149,7 @@ def get_job(self, job_id: int) -> Dict:
"Content-Type": "application/json",
},
)
return response.json()["data"]
return JobDefinition(**response.json()["data"])

def get_env_vars(
self, project_id: int, job_id: int
Expand Down
121 changes: 113 additions & 8 deletions src/main.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import os
from ruamel.yaml import YAML
import sys

from loguru import logger
Expand Down Expand Up @@ -87,7 +86,6 @@ def build_change_set(config):

# Replicate the env vars from the YML to dbt Cloud
for job in defined_jobs.values():

if job.identifier in mapping_job_identifier_job_id: # the job already exists
job_id = mapping_job_identifier_job_id[job.identifier]
all_env_vars_for_job = dbt_cloud.get_env_vars(project_id=job.project_id, job_id=job_id)
Expand Down Expand Up @@ -130,7 +128,6 @@ def build_change_set(config):

# Delete the env vars from dbt Cloud that are not in the yml
for job in defined_jobs.values():

# we only delete env var overwrite if the job already exists
if job.identifier in mapping_job_identifier_job_id:
job_id = mapping_job_identifier_job_id[job.identifier]
Expand Down Expand Up @@ -270,11 +267,7 @@ def validate(config, online):

# In case deferral jobs are mentioned, check that they exist
deferral_envs = set(
[
job.deferring_environment_id
for job in defined_jobs
if job.deferring_environment_id
]
[job.deferring_environment_id for job in defined_jobs if job.deferring_environment_id]
)
if deferral_envs:
logger.info(f"Checking that Deferring Env IDs are valid")
Expand Down Expand Up @@ -342,5 +335,117 @@ def import_jobs(config, account_id, job_id):
export_jobs_yml(cloud_jobs)


@cli.command()
@click.option("--config", type=click.File("r"), help="The path to your YML jobs config file.")
@click.option("--account-id", type=int, help="The ID of your dbt Cloud account.")
@click.option("--dry-run", is_flag=True, help="In dry run mode we don't update dbt Cloud.")
@click.option(
"--identifier",
"-i",
type=str,
multiple=True,
help="[Optional] The identifiers we want to unlink. If not provided, all jobs are unlinked.",
)
def unlink(config, account_id, dry_run, identifier):
"""
Unlink the YML file to dbt Cloud.
All relevant jobs get the part [[...]] removed from their name
"""

# we get the account id either from a parameter (e.g if the config file doesn't exist) or from the config file
if account_id:
cloud_account_id = account_id
elif config:
defined_jobs = load_job_configuration(config).jobs.values()
cloud_account_id = list(defined_jobs)[0].account_id
else:
raise click.BadParameter("Either --config or --account-id must be provided")

# we get the account id from the config file
defined_jobs = load_job_configuration(config).jobs.values()
cloud_account_id = list(defined_jobs)[0].account_id

dbt_cloud = DBTCloud(
account_id=cloud_account_id,
api_key=os.environ.get("DBT_API_KEY"),
base_url=os.environ.get("DBT_BASE_URL", "https://cloud.getdbt.com"),
)
cloud_jobs = dbt_cloud.get_jobs()
selected_jobs = [job for job in cloud_jobs if job.identifier is not None]
logger.info(f"Getting the jobs definition from dbt Cloud")

if identifier:
selected_jobs = [job for job in selected_jobs if job.identifier in identifier]

for cloud_job in selected_jobs:
current_identifier = cloud_job.identifier
# by removing the identifier, we unlink the job from the YML file
cloud_job.identifier = None
if dry_run:
logger.info(
f"Would unlink/rename the job {cloud_job.id}:{cloud_job.name} [[{current_identifier}]]"
)
else:
logger.info(
f"Unlinking/Renaming the job {cloud_job.id}:{cloud_job.name} [[{current_identifier}]]"
)
dbt_cloud.update_job(job=cloud_job)

if len(selected_jobs) == 0:
logger.info(f"No jobs to unlink")
elif not dry_run:
logger.success(f"Updated all jobs!")


@cli.command()
@click.option("--config", type=click.File("r"), help="The path to your YML jobs config file.")
@click.option("--account-id", type=int, help="The ID of your dbt Cloud account.")
@click.option(
"--job-id",
"-j",
type=int,
multiple=True,
help="[Optional] The ID of the job to deactivate.",
)
def deactivate_jobs(config, account_id, job_id):
"""
Deactivate jobs triggers in dbt Cloud (schedule and CI/CI triggers)
"""

# we get the account id either from a parameter (e.g if the config file doesn't exist) or from the config file
if account_id:
cloud_account_id = account_id
elif config:
defined_jobs = load_job_configuration(config).jobs.values()
cloud_account_id = list(defined_jobs)[0].account_id
else:
raise click.BadParameter("Either --config or --account-id must be provided")

dbt_cloud = DBTCloud(
account_id=cloud_account_id,
api_key=os.environ.get("DBT_API_KEY"),
base_url=os.environ.get("DBT_BASE_URL", "https://cloud.getdbt.com"),
)
cloud_jobs = dbt_cloud.get_jobs()

selected_cloud_jobs = [job for job in cloud_jobs if job.id in job_id]

for cloud_job in selected_cloud_jobs:
if (
cloud_job.triggers.git_provider_webhook
or cloud_job.triggers.github_webhook
or cloud_job.triggers.schedule
):
logger.info(f"Deactivating the job {cloud_job.id}:{cloud_job.name}")
cloud_job.triggers.github_webhook = False
cloud_job.triggers.git_provider_webhook = False
cloud_job.triggers.schedule = False
dbt_cloud.update_job(job=cloud_job)
else:
logger.info(f"The job {cloud_job.id}:{cloud_job.name} is already deactivated")

logger.success(f"Deactivated all jobs!")


if __name__ == "__main__":
cli()
5 changes: 4 additions & 1 deletion src/schemas/job.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,10 @@ def to_payload(self):

# Rewrite the job name to embed the job ID from job.yml
payload = self.copy()
payload.name = f"{self.name} [[{self.identifier}]]"
# if there is an identifier, add it to the name
# otherwise, it means that we are "unlinking" the job from the job.yml
if self.identifier:
payload.name = f"{self.name} [[{self.identifier}]]"
return payload.json(exclude={"identifier", "custom_environment_variables"})

def to_load_format(self):
Expand Down