Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove .fromSamplesheet, but create an equivalent function samplesheetToList #3

Merged
merged 23 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
4970b54
simple steps for the transition to an operator
nvnieuwk Mar 20, 2024
18ae17c
Merge branch 'rebrand-to-nf-schema' into rework-fromSamplesheet
nvnieuwk Mar 20, 2024
4977ec0
convert fromSamplesheet to an operator
nvnieuwk Mar 20, 2024
ba91212
move as much logic as possible to the SamplesheetConverter
nvnieuwk Mar 21, 2024
d28b63c
Make the operator more stable
nvnieuwk Mar 21, 2024
7da6e06
add function equivalent to fromSamplesheet
nvnieuwk Mar 21, 2024
2352139
Merge branch 'rebrand-to-nf-schema' into rework-fromSamplesheet
nvnieuwk Apr 9, 2024
3772bea
update tests
nvnieuwk Apr 9, 2024
9a19944
Merge branch 'rebrand-to-nf-schema' into rework-fromSamplesheet
nvnieuwk Apr 10, 2024
1bfc475
let Nextflow do the file handling
nvnieuwk Apr 11, 2024
25f0b9c
update docs with the changes
nvnieuwk Apr 11, 2024
c84c55e
prettier
nvnieuwk Apr 11, 2024
773e309
fix tests
nvnieuwk Apr 11, 2024
7524e16
Merge branch 'master' into rework-fromSamplesheet
nvnieuwk Apr 11, 2024
5bcd13c
Update CHANGELOG.md
nvnieuwk Apr 15, 2024
cfd0f60
apply review suggestions
nvnieuwk Apr 15, 2024
08d3317
Merge branch 'rework-fromSamplesheet' of github.com:nextflow-io/nf-sc…
nvnieuwk Apr 15, 2024
c971158
fixed error with GString inputs
nvnieuwk Apr 15, 2024
f02b240
Use CharSequence instead of String and GString
nvnieuwk Apr 16, 2024
321c245
add make install as shown in the podcast
nvnieuwk Apr 17, 2024
68117c0
remove fromSamplesheet and only use samplesheetToList
nvnieuwk Apr 18, 2024
131848b
fix tests
nvnieuwk Apr 18, 2024
47514c0
add a small note about the automatic typing in csv or tsv
nvnieuwk Apr 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 3 additions & 133 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,20 @@

# Version 2.0.0 - Kagoshima

:warning: This version contains a number of breaking changes. Please read the changelog carefully before upgrading. :warning:

To migrate your schemas please follow the [migration guide](https://nextflow-io.github.io/nf-validation/latest/migration_guide/)
To migrate from nf-validation please follow the [migration guide](https://nextflow-io.github.io/nf-validation/latest/migration_guide/)
nvnieuwk marked this conversation as resolved.
Show resolved Hide resolved

## New features

- Added the `uniqueEntries` keyword. This keyword takes a list of strings corresponding to names of fields that need to be a unique combination. e.g. `uniqueEntries: ['sample', 'replicate']` will make sure that the combination of the `sample` and `replicate` fields is unique. ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- Added `samplesheetToList` which is the function equivalent of `.fromSamplesheet` [#3](https://github.com/nextflow-io/nf-schema/pull/3)

## Changes

- Changed the used draft for the schema from `draft-07` to `draft-2020-12`. See the [2019-09](https://json-schema.org/draft/2019-09/release-notes) and [2020-12](https://json-schema.org/draft/2020-12/release-notes) release notes for all changes ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- Removed all validation code from the `.fromSamplesheet()` channel factory. The validation is now solely done in the `validateParameters()` function. A custom error message will now be displayed if any error has been encountered during the conversion ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- Removed the `unique` keyword from the samplesheet schema. You should now use [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) or `uniqueEntries` instead ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- Removed the `skip_duplicate_check` option from the `fromSamplesheet()` channel factory and the `--validationSkipDuplicateCheck` parameter. You should now use the `uniqueEntries` or [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) keywords in the schema instead ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- `.fromSamplesheet()` now is a channel operator instead of a channel factory. It takes one required argument which can either be a string containing the relative path to the schema or a file object of the schema [#3](https://github.com/nextflow-io/nf-schema/pull/3)
- `.fromSamplesheet()` now does dynamic typecasting instead of using the `type` fields in the JSON schema. This is done due to the complexity of `draft-2020-12` JSON schemas. This should not have that much impact but keep in mind that some types can be different between this and earlier versions because of this ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- `.fromSamplesheet()` will now set all missing values as `[]` instead of the type specific defaults (because of the changes in the previous point). This should not change that much as this will also result in `false` when used in conditions. ([#141](https://github.com/nextflow-io/nf-validation/pull/141))

Expand All @@ -25,133 +25,3 @@ To migrate your schemas please follow the [migration guide](https://nextflow-io.
- The `schema` keyword will now work in all schemas. ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- Improved the error messages ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- `.fromSamplesheet()` now supports deeply nested samplesheets ([#141](https://github.com/nextflow-io/nf-validation/pull/141))

# Version 1.1.3 - Asahikawa

## Improvements

- Added support for double quotes (`"`) in CSV and TSV samplesheets ([#134](https://github.com/nextflow-io/nf-validation/pull/134))

# Version 1.1.2 - Wakayama

## Bug fixes

- Fixed an issue with inputs using `file-path-pattern` where only one file was found (`Path` casting to `ArrayList` error) ([#132](https://github.com/nextflow-io/nf-validation/pull/132))

# Version 1.1.1 - Shoyu

## Bug fixes

- Fixed an issue where samplesheet with a lot of null values would take forever to validate ([#120](https://github.com/nextflow-io/nf-validation/pull/120)) => Thanks @awgymer for fixing this!
- Now YAML files are actually validated instead of skipped ([#124](https://github.com/nextflow-io/nf-validation/pull/120))

# Version 1.1.0 - Miso

## Features

- Add support for samplesheets with no header ([#115](https://github.com/nextflow-io/nf-validation/pull/115))

## Bug fixes

- Floats and doubles should now be created when using the `number` type in the schema ([#113](https://github.com/nextflow-io/nf-validation/pull/113/))
- When `0` is used as a default value in the schema, a `0` will now be used as the value in the `.fromSamplesheet()` channel instead of `null` ([#114](https://github.com/nextflow-io/nf-validation/pull/114))

## New features

- Added `file-path-pattern` format to check every file fetched using a glob pattern. Using a glob is now also possible in the samplesheet and will create a list of all files found using that glob pattern. ([#118](https://github.com/nextflow-io/nf-validation/pull/118))

# Version 1.0.0 - Tonkotsu

The nf-validation plugin is now in production use across many pipelines and has (we hope) now reached a point of relative stability. The bump to major version v1.0.0 signifies that it is suitable for use in production pipelines.

This version also introduces a small breaking change of syntax when providing optional arguments to the functions. You can now provide optional arguments such as the nextflow parameters schema path as:
`validateParameters(parameters_schema: 'my_file.json')`

(previous syntax used positional arguments instead).

## Bug fixes

- The path to a custom parameters schema must be provided through a map '`parameters_schema: 'my_file.json'`' in `validateParameters()` and `paramsSummaryMap()` ([#108](https://github.com/nextflow-io/nf-validation/pull/108))

# Version 0.3.4

This version introduced a bug which made all pipeline runs using the function `validateParameters()` without providing any arguments fail.

This bug causes Nextflow to exit with an error on launch for most pipelines. It should not be used. It was [removed](https://github.com/nextflow-io/plugins/pull/40) from the Nextflow Plugin registry to avoid breaking people's runs.

### Bug fixes

- Do not check S3 URL paths with `PathValidator` `FilePathValidator` and `DirectoryPathValidator` ([#106](https://github.com/nextflow-io/nf-validation/pull/106))
- Make monochrome_logs an option in `paramsSummaryLog()`, `paramsSummaryMap()` and `paramsHelp()` instead of a global parameter ([#101](https://github.com/nextflow-io/nf-validation/pull/101))

# Version 0.3.3

### Bug fixes

- Do not check if S3 URL paths exists to avoid AWS errors, and add a new parameter `validationS3PathCheck` ([#104](https://github.com/nextflow-io/nf-validation/pull/104))

# Version 0.3.2

### Bug fixes

- Add parameters defined on the top level of the schema and within the definitions section as expected params ([#79](https://github.com/nextflow-io/nf-validation/pull/79))
- Fix error when a parameter is not present in the schema and evaluates to false ([#89](https://github.com/nextflow-io/nf-validation/pull/89))
- Changed the `schema_filename` option of `fromSamplesheet` to `parameters_schema` to make this option more clear to the user ([#91](https://github.com/nextflow-io/nf-validation/pull/91))

## Version 0.3.1

### Bug fixes

- Don't check if path exists if param is not true ([#74](https://github.com/nextflow-io/nf-validation/pull/74))
- Don't validate a file if the parameter evaluates to false ([#75](https://github.com/nextflow-io/nf-validation/pull/75))

## Version 0.3.0

### New features

- Check that a sample sheet doesn't have duplicated entries by default. Can be disabled with `--validationSkipDuplicateCheck` ([#72](https://github.com/nextflow-io/nf-validation/pull/72))

### Bug fixes

- Only validate a path if it is not null ([#50](https://github.com/nextflow-io/nf-validation/pull/50))
- Only validate a file with a schema if the file path is provided ([#51](https://github.com/nextflow-io/nf-validation/pull/51))
- Handle errors when sample sheet not provided or doesn't have a schema ([#56](https://github.com/nextflow-io/nf-validation/pull/56))
- Silently ignore samplesheet fields that are not defined in samplesheet schema ([#59](https://github.com/nextflow-io/nf-validation/pull/59))
- Correctly handle double-quoted fields containing commas in csv files by `.fromSamplesheet()` ([#63](https://github.com/nextflow-io/nf-validation/pull/63))
- Print param name when path does not exist ([#65](https://github.com/nextflow-io/nf-validation/pull/65))
- Fix file or directory does not exist error not printed when it was the only error in a samplesheet ([#65](https://github.com/nextflow-io/nf-validation/pull/65))
- Do not return parameter in summary if it has no default in the schema and is set to 'false' ([#66](https://github.com/nextflow-io/nf-validation/pull/66))
- Skip the validation of a file if the path is an empty string and improve error message when the path is invalid ([#69](https://github.com/nextflow-io/nf-validation/pull/69))

### Deprecated

- The meta map of input channels is not an ImmutableMap anymore ([#68](https://github.com/nextflow-io/nf-validation/pull/68)). Reason: [Issue #52](https://github.com/nextflow-io/nf-validation/issues/52)

## Version 0.2.1

### Bug fixes

- Fixed a bug where `immutable_meta` option in `fromSamplesheet()` wasn't working when using `validateParameters()` first. (@nvnieuwk)

## Version 0.2.0

### New features

- Added a new [documentation site](https://nextflow-io.github.io/nf-validation/). (@ewels and @mashehu)
- Removed the `file-path-exists`, `directory-path-exists` and `path-exists` and added a [`exists`](https://nextflow-io.github.io/nf-validation/nextflow_schema/nextflow_schema_specification/#exists) parameter to the schema. (@mirpedrol)
- New [`errorMessage`](https://nextflow-io.github.io/nf-validation/nextflow_schema/nextflow_schema_specification/#errormessage) parameter for the schema which can be used to create custom error messages. (@mirpedrol)
- Samplesheet validation now happens in `validateParameters()` using the schema specified by the `schema` parameter in the parameters schema. (@mirpedrol)

### Improvements

- The `meta` maps are now immutable by default, see [`ImmutableMap`](https://nextflow-io.github.io/nf-validation/samplesheets/immutable_map/) for more info (@nvnieuwk)
- `validateAndConvertSamplesheet()` has been renamed to `fromSamplesheet()`
- Refactor `--schema_ignore_params` to `--validationSchemaIgnoreParams`

### Bug fixes

- Fixed a bug where an empty meta map would be created when no meta values are in the samplesheet schema. (@nvnieuwk)

## Version 0.1.0

Initial release.
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,8 @@ validateParameters()
// Print summary of supplied parameters
log.info paramsSummaryLog(workflow)

// Create a new channel of metadata from a sample sheet
// NB: `input` corresponds to `params.input` and associated sample sheet schema
ch_input = Channel.fromSamplesheet("input")
// Create a new channel of metadata from a sample sheet passed to the pipeline through the --input parameter
ch_input = Channel.of(params.input).fromSamplesheet("assets/schema_input.json")
```

## Dependencies
Expand All @@ -62,7 +61,7 @@ ch_input = Channel.fromSamplesheet("input")

## Slack channel

There is a dedicated [nf-validation Slack channel](https://nfcore.slack.com/archives/C056RQB10LU) in the [Nextflow Slack workspace](nextflow.slack.com).
There is a dedicated [nf-validation Slack channel](https://nfcore.slack.com/archives/C056RQB10LU) in the [Nextflow Slack workspace](https://nextflow.slack.com).

## Credits

Expand Down
2 changes: 2 additions & 0 deletions docs/background.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,5 @@ In addition to config params, a common best-practice for pipelines is to use a "
Nextflow itself does not provide functionality to validate config parameters or parsed sample sheets. To bridge this gap, we developed code within the [nf-core community](https://nf-co.re/) to allow pipelines to work with a standard `nextflow_schema.json` file, written using the [JSON Schema](https://json-schema.org/) format. The file allows strict typing of parameter variables and inclusion of validation rules.

The nf-schema plugin moves this code out of the nf-core template into a stand-alone package, to make it easier to use for the wider Nextflow community. It also incorporates a number of new features, such as native Groovy sample sheet validation.

Earlier versions of the plugin can be found in the [nf-validation](https://github.com/nextflow-io/nf-validation) repository and can still be used in the pipeline. However the nf-validation plugin is no longer supported and all development has been moved to nf-schema.
43 changes: 31 additions & 12 deletions docs/migration_guide.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,22 @@
---
title: Migration guide
description: Guide to migrate pipelines using nf-schema pre v2.0.0 to after v2.0.0
description: Guide to migrate pipelines from nf-validation to nf-schema
hide:
- toc
---

# Migration guide

This guide is intended to help you migrate your pipeline from older versions of the plugin to version 2.0.0 and later.
This guide is intended to help you migrate your pipeline from [nf-validation](https://github.com/nextflow-io/nf-validation) to nf-schema.

## Major changes in the plugin

Following list shows the major breaking changes introduced in version 2.0.0:
Following list shows the major breaking changes introduced in nf-schema:

1. The JSON schema draft has been updated from `draft-07` to `draft-2020-12`. See [JSON Schema draft 2020-12 release notes](https://json-schema.org/draft/2020-12/release-notes) and [JSON schema draft 2019-09 release notes](https://json-schema.org/draft/2019-09/release-notes) for more information.
2. The `unique` keyword for samplesheet schemas has been removed. Please use [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) or [`uniqueEntries`](nextflow_schema/nextflow_schema_specification.md#uniqueentries) now instead.
3. The `dependentRequired` keyword now works as it's supposed to work in JSON schema. See [`dependentRequired`](https://json-schema.org/understanding-json-schema/reference/conditionals#dependentRequired) for more information
2. The `fromSamplesheet` channel factory has been converted to a channel operator. See [updating `fromSamplesheet`](#updating-fromsamplesheet) for more information.
3. The `unique` keyword for samplesheet schemas has been removed. Please use [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) or [`uniqueEntries`](nextflow_schema/nextflow_schema_specification.md#uniqueentries) now instead.
4. The `dependentRequired` keyword now works as it's supposed to work in JSON schema. See [`dependentRequired`](https://json-schema.org/understanding-json-schema/reference/conditionals#dependentRequired) for more information

A full list of changes can be found in the [changelog](https://github.com/nextflow-io/nf-schema/blob/master/CHANGELOG.md).

Expand All @@ -31,9 +32,27 @@ This will replace the old schema draft specification (`draft-07`) by the new one

!!! note

Repeat this command for every JSON schema you use in your pipeline. e.g. for the default samplesheet schema:
Repeat this command for every JSON schema you use in your pipeline. e.g. for the default samplesheet schema in nf-core pipelines:
`bash sed -i -e 's/http:\/\/json-schema.org\/draft-07\/schema/https:\/\/json-schema.org\/draft\/2020-12\/schema/g' -e 's/definitions/defs/g' assets/schema_input.json `

Next you should update the `.fromSamplesheet` channel factory to the channel operator. Following tabs shows the difference between the versions:

=== "nf-validation"

```groovy
Channel.fromSamplesheet("input")
```

=== "nf-schema"

```groovy
Channel.of(params.input).fromSamplesheet("path/to/samplesheet/schema")
```

!!! note

This change was necessary to make it possible for pipelines to be used as pluggable workflows. This also enables the validation and conversion of files generated by the pipeline.

If you are using any special features in your schemas, you will need to update your schemas manually. Please refer to the [JSON Schema draft 2020-12 release notes](https://json-schema.org/draft/2020-12/release-notes) and [JSON schema draft 2019-09 release notes](https://json-schema.org/draft/2019-09/release-notes) for more information.

However here are some guides to the more common migration patterns:
Expand All @@ -44,7 +63,7 @@ When you use `unique` in your schemas, you should update it to use `uniqueItems`

If you used the `unique:true` field, you should update it to use `uniqueItems` like this:

=== "Before v2.0"
=== "nf-validation"

```json hl_lines="9"
{
Expand All @@ -62,7 +81,7 @@ If you used the `unique:true` field, you should update it to use `uniqueItems` l
}
```

=== "After v2.0"
=== "nf-schema"

```json hl_lines="12"
{
Expand All @@ -82,7 +101,7 @@ If you used the `unique:true` field, you should update it to use `uniqueItems` l

If you used the `unique: ["field1", "field2"]` field, you should update it to use `uniqueEntries` like this:

=== "Before v2.0"
=== "nf-validation"

```json hl_lines="9"
{
Expand All @@ -100,7 +119,7 @@ If you used the `unique: ["field1", "field2"]` field, you should update it to us
}
```

=== "After v2.0"
=== "nf-schema"

```json hl_lines="12"
{
Expand All @@ -122,7 +141,7 @@ If you used the `unique: ["field1", "field2"]` field, you should update it to us

When you use `dependentRequired` in your schemas, you should update it like this:

=== "Before v2.0"
=== "nf-validation"

```json hl_lines="12"
{
Expand All @@ -142,7 +161,7 @@ When you use `dependentRequired` in your schemas, you should update it like this
}
```

=== "After v2.0"
=== "nf-schema"

```json hl_lines="14 15 16"
{
Expand Down
2 changes: 2 additions & 0 deletions docs/nextflow_schema/create_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,4 +76,6 @@ This web interface is where you should add detail to your schema, customising th

There is currently no tooling to help you write sample sheet schema :anguished:

You can find an example in [Example sample sheet schema](sample_sheet_schema_examples.md)

Watch this space..
Loading
Loading