diff --git a/README.md b/README.md index 111afd2..5e7a9e7 100644 --- a/README.md +++ b/README.md @@ -25,12 +25,10 @@ With this package's approach, people don't need to learn another tool and can co ### Installation -This package uses `poetry` for dependency management. -In the near future the package might be added to PyPi but for now the installation is manual, as follows: +- Create a Python virtual environment and activate it +- Run `pip install git+https://github.com/dbt-labs/dbt-jobs-as-code.git` -1. clone this repository -2. run `poetry install` -3. run `poetry run dbt-jobs-as-code` to see the different list of commands available +The CLI is now available as `dbt-jobs-as-code` ### Pre-requisites @@ -43,31 +41,72 @@ The following environment variables are used to run the code: The CLI comes with a few different commands -- `poetry run python src/main.py validate `: validates that the YAML file has the correct structure - - it is possible to run the validation offline, without doing any API call - - or online using `--online`, in order to check that the different IDs provided are correct -- `poetry run python src/main.py plan `: returns the list of actions create/update/delete that are required to have dbt Cloud reflecting the configuration file - - this command doesn't modify the dbt Cloud jobs -- `poetry run python src/main.py sync `: create/update/delete jobs and env vars overwrites in jobs to align dbt Cloud with the configuration file - - ⚠️ this command will modify your dbt Cloud jobs if the current configuration is different from the YAML file -- `poetry run python src/main.py import-jobs --config ` or `poetry run python src/main.py import-jobs --account-id `: Queries dbt Cloud and provide the YAML definition for those jobs. It includes the env var overwrite at the job level if some have been defined - - it is possible to restrict the list of dbt Cloud Job IDs by adding `... -j 101 -j 123 -j 234` - - once the YAML has been retrieved, it is possible to copy/paste it in a local YAML file to create/update the local jobs definition. - - to move some ui-jobs to jobs-as-code, perform the following steps: - - run the command to import the jobs - - copy paste the job/jobs into a YAML file - - change the `import_` id of the job in the YML file to another unique identifier - - rename the job in the UI to end with `[[new_job_identifier]]` - - run a `plan` command to verify that no changes are required for the given job +#### `validate` + +Command: `dbt-jobs-as-code validate ` + +Validates that the YAML file has the correct structure + +- it is possible to run the validation offline, without doing any API call +- or online using `--online`, in order to check that the different IDs provided are correct + +#### `plan` + +Command: `dbt-jobs-as-code plan ` + +Returns the list of actions create/update/delete that are required to have dbt Cloud reflecting the configuration file + +- this command doesn't modify the dbt Cloud jobs + +#### `sync` + +Command: `dbt-jobs-as-code sync ` + +Create/update/delete jobs and env vars overwrites in jobs to align dbt Cloud with the configuration file + +- ⚠️ this command will modify your dbt Cloud jobs if the current configuration is different from the YAML file + +#### `import-jobs` + +Command: `dbt-jobs-as-code import-jobs --config ` or `dbt-jobs-as-code import-jobs --account-id ` + +Queries dbt Cloud and provide the YAML definition for those jobs. It includes the env var overwrite at the job level if some have been defined + +- it is possible to restrict the list of dbt Cloud Job IDs by adding `... -j 101 -j 123 -j 234` +- once the YAML has been retrieved, it is possible to copy/paste it in a local YAML file to create/update the local jobs definition. + +To move some ui-jobs to jobs-as-code, perform the following steps: + +- run the command to import the jobs +- copy paste the job/jobs into a YAML file +- change the `import_` id of the job in the YML file to another unique identifier +- rename the job in the UI to end with `[[new_job_identifier]]` +- run a `plan` command to verify that no changes are required for the given job + +#### `unlink` + +Command: `dbt-jobs-as-code unlink --config ` or `dbt-jobs-as-code unlink --account-id ` + +Unlinking jobs removes the `[[ ... ]]` part of the job name in dbt Cloud. + +⚠️ This can't be rolled back by the tool. Doing a `unlink` followed by a `sync` will create new instances of the jobs, with the `[[]]` part + +- it is possible to restrict the list of jobs to unlink by adding the job identifiers to unlink `... -i import_1 -i my_job_2` + +#### `deactivate-jobs` + +Command: `dbt-jobs-as-code deactivate-jobs --account-id 1234 --job-id 12 --job-id 34 --job-id 56` + +This command can be used to deactivate both the schedule and the CI triggers for dbt Cloud jobs. This can be useful when moving jobs from one project to another. When the new jobs have been created, this command can be used to deactivate the jobs from the old project. ### Job Configuration YAML Schema The file `src/schemas/load_job_schema.json` is a JSON Schema file that can be used to verify that the YAML config files syntax is correct. -To use it in VSCode, install the extension `YAML` and add the following line at the top of your YAML config file (change the path if need be): +To use it in VSCode, install [the extension `YAML`](https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml) and add the following line at the top of your YAML config file (change the path if need be): ```yaml -# yaml-language-server: $schema=../src/schemas/load_job_schema.json +# yaml-language-server: $schema=https://raw.githubusercontent.com/dbt-labs/dbt-jobs-as-code/main/src/schemas/load_job_schema.json ``` ## Running the tool as part of CI/CD diff --git a/src/client/__init__.py b/src/client/__init__.py index 15e6ed9..0009a64 100644 --- a/src/client/__init__.py +++ b/src/client/__init__.py @@ -8,7 +8,6 @@ CustomEnvironmentVariablePayload, ) from src.schemas.job import JobDefinition -from src.schemas import check_env_var_same class DBTCloud: @@ -138,7 +137,7 @@ def get_jobs(self) -> List[JobDefinition]: return [JobDefinition(**job) for job in jobs] - def get_job(self, job_id: int) -> Dict: + def get_job(self, job_id: int) -> JobDefinition: """Generate a Job based on a dbt Cloud job.""" self._check_for_creds() @@ -150,7 +149,7 @@ def get_job(self, job_id: int) -> Dict: "Content-Type": "application/json", }, ) - return response.json()["data"] + return JobDefinition(**response.json()["data"]) def get_env_vars( self, project_id: int, job_id: int diff --git a/src/main.py b/src/main.py index e2b2a3f..b160e54 100644 --- a/src/main.py +++ b/src/main.py @@ -1,5 +1,4 @@ import os -from ruamel.yaml import YAML import sys from loguru import logger @@ -87,7 +86,6 @@ def build_change_set(config): # Replicate the env vars from the YML to dbt Cloud for job in defined_jobs.values(): - if job.identifier in mapping_job_identifier_job_id: # the job already exists job_id = mapping_job_identifier_job_id[job.identifier] all_env_vars_for_job = dbt_cloud.get_env_vars(project_id=job.project_id, job_id=job_id) @@ -130,7 +128,6 @@ def build_change_set(config): # Delete the env vars from dbt Cloud that are not in the yml for job in defined_jobs.values(): - # we only delete env var overwrite if the job already exists if job.identifier in mapping_job_identifier_job_id: job_id = mapping_job_identifier_job_id[job.identifier] @@ -270,11 +267,7 @@ def validate(config, online): # In case deferral jobs are mentioned, check that they exist deferral_envs = set( - [ - job.deferring_environment_id - for job in defined_jobs - if job.deferring_environment_id - ] + [job.deferring_environment_id for job in defined_jobs if job.deferring_environment_id] ) if deferral_envs: logger.info(f"Checking that Deferring Env IDs are valid") @@ -342,5 +335,117 @@ def import_jobs(config, account_id, job_id): export_jobs_yml(cloud_jobs) +@cli.command() +@click.option("--config", type=click.File("r"), help="The path to your YML jobs config file.") +@click.option("--account-id", type=int, help="The ID of your dbt Cloud account.") +@click.option("--dry-run", is_flag=True, help="In dry run mode we don't update dbt Cloud.") +@click.option( + "--identifier", + "-i", + type=str, + multiple=True, + help="[Optional] The identifiers we want to unlink. If not provided, all jobs are unlinked.", +) +def unlink(config, account_id, dry_run, identifier): + """ + Unlink the YML file to dbt Cloud. + All relevant jobs get the part [[...]] removed from their name + """ + + # we get the account id either from a parameter (e.g if the config file doesn't exist) or from the config file + if account_id: + cloud_account_id = account_id + elif config: + defined_jobs = load_job_configuration(config).jobs.values() + cloud_account_id = list(defined_jobs)[0].account_id + else: + raise click.BadParameter("Either --config or --account-id must be provided") + + # we get the account id from the config file + defined_jobs = load_job_configuration(config).jobs.values() + cloud_account_id = list(defined_jobs)[0].account_id + + dbt_cloud = DBTCloud( + account_id=cloud_account_id, + api_key=os.environ.get("DBT_API_KEY"), + base_url=os.environ.get("DBT_BASE_URL", "https://cloud.getdbt.com"), + ) + cloud_jobs = dbt_cloud.get_jobs() + selected_jobs = [job for job in cloud_jobs if job.identifier is not None] + logger.info(f"Getting the jobs definition from dbt Cloud") + + if identifier: + selected_jobs = [job for job in selected_jobs if job.identifier in identifier] + + for cloud_job in selected_jobs: + current_identifier = cloud_job.identifier + # by removing the identifier, we unlink the job from the YML file + cloud_job.identifier = None + if dry_run: + logger.info( + f"Would unlink/rename the job {cloud_job.id}:{cloud_job.name} [[{current_identifier}]]" + ) + else: + logger.info( + f"Unlinking/Renaming the job {cloud_job.id}:{cloud_job.name} [[{current_identifier}]]" + ) + dbt_cloud.update_job(job=cloud_job) + + if len(selected_jobs) == 0: + logger.info(f"No jobs to unlink") + elif not dry_run: + logger.success(f"Updated all jobs!") + + +@cli.command() +@click.option("--config", type=click.File("r"), help="The path to your YML jobs config file.") +@click.option("--account-id", type=int, help="The ID of your dbt Cloud account.") +@click.option( + "--job-id", + "-j", + type=int, + multiple=True, + help="[Optional] The ID of the job to deactivate.", +) +def deactivate_jobs(config, account_id, job_id): + """ + Deactivate jobs triggers in dbt Cloud (schedule and CI/CI triggers) + """ + + # we get the account id either from a parameter (e.g if the config file doesn't exist) or from the config file + if account_id: + cloud_account_id = account_id + elif config: + defined_jobs = load_job_configuration(config).jobs.values() + cloud_account_id = list(defined_jobs)[0].account_id + else: + raise click.BadParameter("Either --config or --account-id must be provided") + + dbt_cloud = DBTCloud( + account_id=cloud_account_id, + api_key=os.environ.get("DBT_API_KEY"), + base_url=os.environ.get("DBT_BASE_URL", "https://cloud.getdbt.com"), + ) + cloud_jobs = dbt_cloud.get_jobs() + + selected_cloud_jobs = [job for job in cloud_jobs if job.id in job_id] + + for cloud_job in selected_cloud_jobs: + if ( + cloud_job.triggers.git_provider_webhook + or cloud_job.triggers.github_webhook + or cloud_job.triggers.schedule + ): + logger.info(f"Deactivating the job {cloud_job.id}:{cloud_job.name}") + cloud_job.triggers.github_webhook = False + cloud_job.triggers.git_provider_webhook = False + cloud_job.triggers.schedule = False + dbt_cloud.update_job(job=cloud_job) + else: + logger.info(f"The job {cloud_job.id}:{cloud_job.name} is already deactivated") + + logger.success(f"Deactivated all jobs!") + + if __name__ == "__main__": cli() diff --git a/src/schemas/job.py b/src/schemas/job.py index 15da52d..55cd6a8 100644 --- a/src/schemas/job.py +++ b/src/schemas/job.py @@ -64,7 +64,10 @@ def to_payload(self): # Rewrite the job name to embed the job ID from job.yml payload = self.copy() - payload.name = f"{self.name} [[{self.identifier}]]" + # if there is an identifier, add it to the name + # otherwise, it means that we are "unlinking" the job from the job.yml + if self.identifier: + payload.name = f"{self.name} [[{self.identifier}]]" return payload.json(exclude={"identifier", "custom_environment_variables"}) def to_load_format(self):