Skip to content

Commit

Permalink
remove fromSamplesheet and only use samplesheetToList
Browse files Browse the repository at this point in the history
  • Loading branch information
nvnieuwk committed Apr 18, 2024
1 parent 321c245 commit 68117c0
Show file tree
Hide file tree
Showing 30 changed files with 167 additions and 219 deletions.
9 changes: 4 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,11 @@ To migrate from nf-validation please follow the [migration guide](https://nextfl
## Changes

- Changed the used draft for the schema from `draft-07` to `draft-2020-12`. See the [2019-09](https://json-schema.org/draft/2019-09/release-notes) and [2020-12](https://json-schema.org/draft/2020-12/release-notes) release notes for all changes ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- Removed all validation code from the `.fromSamplesheet()` channel factory. The validation is now solely done in the `validateParameters()` function. A custom error message will now be displayed if any error has been encountered during the conversion ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- Removed the `fromSamplesheet` channel operator and added a `samplesheetToList` function instead. This function validates the samplesheet and returns a list of it. [#3](https://github.com/nextflow-io/nf-schema/pull/3)
- Removed the `unique` keyword from the samplesheet schema. You should now use [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) or `uniqueEntries` instead ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- Removed the `skip_duplicate_check` option from the `fromSamplesheet()` channel factory and the `--validationSkipDuplicateCheck` parameter. You should now use the `uniqueEntries` or [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) keywords in the schema instead ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- `.fromSamplesheet()` now is a channel operator instead of a channel factory. It takes one required argument which can either be a string containing the relative path to the schema or a file object of the schema [#3](https://github.com/nextflow-io/nf-schema/pull/3)
- `.fromSamplesheet()` now does dynamic typecasting instead of using the `type` fields in the JSON schema. This is done due to the complexity of `draft-2020-12` JSON schemas. This should not have that much impact but keep in mind that some types can be different between this and earlier versions because of this ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- `.fromSamplesheet()` will now set all missing values as `[]` instead of the type specific defaults (because of the changes in the previous point). This should not change that much as this will also result in `false` when used in conditions. ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- Removed the `skip_duplicate_check` option from the `samplesheetToList()` function and the `--validationSkipDuplicateCheck` parameter. You should now use the `uniqueEntries` or [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) keywords in the schema instead ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- `samplesheetToList()` now does dynamic typecasting instead of using the `type` fields in the JSON schema. This is done due to the complexity of `draft-2020-12` JSON schemas. This should not have that much impact but keep in mind that some types can be different between this version and older versions in nf-validation ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- `samplesheetToList()` will now set all missing values as `[]` instead of the type specific defaults (because of the changes in the previous point). This should not change that much as this will also result in `false` when used in conditions. ([#141](https://github.com/nextflow-io/nf-validation/pull/141))

## Improvements

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ This is all that is needed - Nextflow will automatically fetch the plugin code a
You can now include the plugin helper functions into your Nextflow pipeline:

```groovy title="main.nf"
include { validateParameters; paramsHelp; paramsSummaryLog; fromSamplesheet } from 'plugin/nf-schema'
include { validateParameters; paramsHelp; paramsSummaryLog; samplesheetToList } from 'plugin/nf-schema'
// Print help message, supply typical command line usage for the pipeline
if (params.help) {
Expand All @@ -51,7 +51,7 @@ validateParameters()
log.info paramsSummaryLog(workflow)
// Create a new channel of metadata from a sample sheet passed to the pipeline through the --input parameter
ch_input = Channel.of(params.input).fromSamplesheet("assets/schema_input.json")
ch_input = Channel.fromList(samplesheetToList(params.input, "assets/schema_input.json"))
```

## Dependencies
Expand All @@ -61,7 +61,7 @@ ch_input = Channel.of(params.input).fromSamplesheet("assets/schema_input.json")

## Slack channel

There is a dedicated [nf-validation Slack channel](https://nfcore.slack.com/archives/C056RQB10LU) in the [Nextflow Slack workspace](https://nextflow.slack.com).
There is a dedicated [nf-schema Slack channel](https://nfcore.slack.com/archives/C056RQB10LU) in the [Nextflow Slack workspace](https://nextflow.slack.com).

## Credits

Expand Down
8 changes: 5 additions & 3 deletions docs/migration_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This guide is intended to help you migrate your pipeline from [nf-validation](ht
Following list shows the major breaking changes introduced in nf-schema:

1. The JSON schema draft has been updated from `draft-07` to `draft-2020-12`. See [JSON Schema draft 2020-12 release notes](https://json-schema.org/draft/2020-12/release-notes) and [JSON schema draft 2019-09 release notes](https://json-schema.org/draft/2019-09/release-notes) for more information.
2. The `fromSamplesheet` channel factory has been converted to a channel operator. See [updating `fromSamplesheet`](#updating-fromsamplesheet) for more information.
2. The `fromSamplesheet` channel factory has been converted to a function called `samplesheetToList`. See [updating `fromSamplesheet`](#updating-fromsamplesheet) for more information.
3. The `unique` keyword for samplesheet schemas has been removed. Please use [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) or [`uniqueEntries`](nextflow_schema/nextflow_schema_specification.md#uniqueentries) now instead.
4. The `dependentRequired` keyword now works as it's supposed to work in JSON schema. See [`dependentRequired`](https://json-schema.org/understanding-json-schema/reference/conditionals#dependentRequired) for more information

Expand All @@ -35,18 +35,20 @@ This will replace the old schema draft specification (`draft-07`) by the new one
Repeat this command for every JSON schema you use in your pipeline. e.g. for the default samplesheet schema in nf-core pipelines:
`bash sed -i -e 's/http:\/\/json-schema.org\/draft-07\/schema/https:\/\/json-schema.org\/draft\/2020-12\/schema/g' -e 's/definitions/defs/g' assets/schema_input.json `

Next you should update the `.fromSamplesheet` channel factory to the channel operator. Following tabs shows the difference between the versions:
Next you should update the `.fromSamplesheet` channel factory to the `samplesheetToList` function. Following tabs shows the difference between the versions:

=== "nf-validation"

```groovy
include { fromSamplesheet } from 'plugin/nf-validation'
Channel.fromSamplesheet("input")
```

=== "nf-schema"

```groovy
Channel.of(params.input).fromSamplesheet("path/to/samplesheet/schema")
include { samplesheetToList } from 'plugin/nf-schema'
Channel.fromList(samplesheetToList(params.input, "path/to/samplesheet/schema"))
```

!!! note
Expand Down
8 changes: 1 addition & 7 deletions docs/nextflow_schema/sample_sheet_schema_specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Fields that are present in the sample sheet, but not in the schema will be ignor
!!! warning

The order of properties in the _schema_ **is** important.
This order defines the order of output channel properties when using the `fromSamplesheet` channel factory.
This order defines the order of output channel properties when using the `samplesheetToList()` function.

## Common keys

Expand All @@ -68,12 +68,6 @@ For example: `type`, `pattern`, `format`, `errorMessage`, `exists` and so on.

Please refer to the [Nextflow schema specification](../nextflow_schema/nextflow_schema_specification.md) docs for details.

!!! tip

Sample sheets are commonly used to define input file paths.
Be sure to set `"type": "string"`, `exists: true`, `"format": "file-path"` and `"schema":"path/to/samplesheet/schema.json"` for these properties,
so that samplesheets are correctly validated and `fromSamplesheet` does not result in any errors.

## Sample sheet keys

Below are the properties that are specific to sample sheet schema.
Expand Down
32 changes: 16 additions & 16 deletions docs/samplesheets/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: Examples of advanced sample sheet creation techniques.

## Introduction

Understanding channel structure and manipulation is critical for getting the most out of Nextflow. nf-schema helps initialise your channels from the text inputs to get you started, but further work might be required to fit your exact use case. In this page we run through some common cases for transforming the output of `.fromSamplesheet()`.
Understanding channel structure and manipulation is critical for getting the most out of Nextflow. nf-schema helps initialise your channels from the text inputs to get you started, but further work might be required to fit your exact use case. In this page we run through some common cases for transforming the output of `samplesheetToList()`.

### Glossary

Expand All @@ -17,15 +17,15 @@ Understanding channel structure and manipulation is critical for getting the mos

## Default mode

Each item in the channel emitted by `.fromSamplesheet()` is a tuple, corresponding with each row of the sample sheet. Each item will be composed of a meta value (if present) and any additional elements from columns in the sample sheet, e.g.:
Each item in the list emitted by `samplesheetToList()` is a tuple, corresponding with each row of the sample sheet. Each item will be composed of a meta value (if present) and any additional elements from columns in the sample sheet, e.g.:

```csv
sample,fastq_1,fastq_2,bed
sample1,fastq1.R1.fq.gz,fastq1.R2.fq.gz,sample1.bed
sample2,fastq2.R1.fq.gz,fastq2.R2.fq.gz,
```

Might create a channel where each element consists of 4 items, a map value followed by three files:
Might create a list where each element consists of 4 items, a map value followed by three files:

```groovy
// Columns:
Expand All @@ -36,13 +36,13 @@ Might create a channel where each element consists of 4 items, a map value follo
[ [ id: "sample2" ], fastq2.R1.fq.gz, fastq2.R2.fq.gz, [] ] // A missing value from the sample sheet is an empty list
```

This channel can be used as input of a process where the input declaration is:
This list can be converted to a channel that can be used as input of a process where the input declaration is:

```nextflow
tuple val(meta), path(fastq_1), path(fastq_2), path(bed)
```

It may be necessary to manipulate this channel to fit your process inputs. For more documentation, check out the [Nextflow operator docs](https://www.nextflow.io/docs/latest/operator.html), however here are some common use cases with `.fromSamplesheet()`.
It may be necessary to manipulate this channel to fit your process inputs. For more documentation, check out the [Nextflow operator docs](https://www.nextflow.io/docs/latest/operator.html), however here are some common use cases with `samplesheetToList()`.

## Using a sample sheet with no headers

Expand Down Expand Up @@ -73,7 +73,7 @@ or this YAML file:
- test_2
```
The output of `.fromSamplesheet()` will look like this:
The output of `samplesheetToList()` will look like this:

```bash
test_1
Expand All @@ -82,7 +82,7 @@ test_2

## Changing the structure of channel items

Each item in the channel will be a tuple, but some processes will use multiple files as a list in their input channel, this is common in nf-core modules. For example, consider the following input declaration in a process, where FASTQ could be > 1 file:
Each item in the list will be a tuple, but some processes will use multiple files as a list in their input channel, this is common in nf-core modules. For example, consider the following input declaration in a process, where FASTQ could be > 1 file:

```groovy
process ZCAT_FASTQS {
Expand All @@ -95,7 +95,7 @@ process ZCAT_FASTQS {
}
```

The output of `.fromSamplesheet()` can be used by default with a process with the following input declaration:
The output of `samplesheetToList()` (converted to a channel) can be used by default with a process with the following input declaration:

```groovy
val(meta), path(fastq_1), path(fastq_2)
Expand All @@ -104,7 +104,7 @@ val(meta), path(fastq_1), path(fastq_2)
To manipulate each item within a channel, you should use the [Nextflow `.map()` operator](https://www.nextflow.io/docs/latest/operator.html#map). This will apply a function to each element of the channel in turn. Here, we convert the flat tuple into a tuple composed of a meta and a list of FASTQ files:

```groovy
Channel.of(params.input).fromSamplesheet("path/to/json/schema")
Channel.fromList(samplesheetToList(params.input, "path/to/json/schema"))
.map { meta, fastq_1, fastq_2 -> tuple(meta, [ fastq_1, fastq_2 ]) }
.set { input }
Expand All @@ -122,7 +122,7 @@ ZCAT_FASTQS(input)
For example, to remove the BED file from the channel created above, we could not return it from the map. Note the absence of the `bed` item in the return of the closure below:

```groovy
Channel.of(params.input).fromSamplesheet("path/to/json/schema")
Channel.fromList(samplesheetToList(params.input, "path/to/json/schema"))
.map { meta, fastq_1, fastq_2, bed -> tuple(meta, fastq_1, fastq_2) }
.set { input }
Expand All @@ -136,7 +136,7 @@ In this way you can drop items from a channel.
We could perform this twice to create one channel containing the FASTQs and one containing the BED files, however Nextflow has a native operator to separate channels called [`.multiMap()`](https://www.nextflow.io/docs/latest/operator.html#multimap). Here, we separate the FASTQs and BEDs into two separate channels using `multiMap`. Note, the channels are both contained in `input` and accessed as an attribute using dot notation:

```groovy
Channel.of(params.input).fromSamplesheet("path/to/json/schema")
Channel.fromList(samplesheetToList(params.input, "path/to/json/schema"))
.multiMap { meta, fastq_1, fastq_2, bed ->
fastq: tuple(meta, fastq_1, fastq_2)
bed: tuple(meta, bed)
Expand All @@ -163,7 +163,7 @@ This example shows a channel which can have entries for WES or WGS data. WES dat
// Channel with four elements - see docs for examples
params.input = "samplesheet.csv"
Channel.of(params.input).fromSamplesheet("path/to/json/schema")
Channel.fromList(samplesheetToList(params.input, "path/to/json/schema"))
.branch { meta, fastq_1, fastq_2, bed ->
// If BED does not exist
WGS: !bed
Expand All @@ -178,13 +178,13 @@ input.WGS.view() // Channel has 3 elements: meta, fastq_1, fastq_2
input.WES.view() // Channel has 4 elements: meta, fastq_1, fastq_2, bed
```

Unlike `multiMap`, the outputs of `.branch()`, the resulting channels will contain a different number of items.
Unlike `.multiMap()`, the outputs of `.branch()` will contain a different number of items.

## Combining a channel

After splitting the channel, it may be necessary to rejoin the channel. There are many ways to join a channel, but here we will demonstrate the simplest which uses the [Nextflow join operator](https://www.nextflow.io/docs/latest/operator.html#join) to rejoin any of the channels from above based on the first element in each item, the `meta` value.

```nextflow
```groovy
input.fastq.view() // Channel has 3 elements: meta, fastq_1, fastq_2
input.bed.view() // Channel has 2 elements: meta, bed
Expand All @@ -204,14 +204,14 @@ It's useful to determine the count of channel entries with similar values when y
This example contains a channel where multiple samples can be in the same family. Later on in the pipeline we want to merge the analyzed files so one file gets created for each family. The result will be a channel with an extra meta field containing the count of channel entries with the same family name.

```groovy
// channel created by fromSamplesheet() previous to modification:
// channel created with samplesheetToList() previous to modification:
// [[id:example1, family:family1], example1.txt]
// [[id:example2, family:family1], example2.txt]
// [[id:example3, family:family2], example3.txt]
params.input = "sample sheet.csv"
Channel.of(params.input).fromSamplesheet("path/to/json/schema")
Channel.fromList(samplesheetToList(params.input, "path/to/json/schema"))
.tap { ch_raw } // Create a copy of the original channel
.map { meta, txt -> [ meta.family ] } // Isolate the value to count on
.reduce([:]) { counts, family -> // Creates a map like this: [family1:2, family2:1]
Expand Down
117 changes: 0 additions & 117 deletions docs/samplesheets/fromSamplesheet.md

This file was deleted.

Loading

0 comments on commit 68117c0

Please sign in to comment.