remove fromSamplesheet and only use samplesheetToList

nextflow-io · Apr 18, 2024 · 68117c0 · 68117c0
1 parent 321c245
commit 68117c0
Show file tree

Hide file tree

Showing 30 changed files with 167 additions and 219 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -12,12 +12,11 @@ To migrate from nf-validation please follow the [migration guide](https://nextfl
 ## Changes
 
 - Changed the used draft for the schema from `draft-07` to `draft-2020-12`. See the [2019-09](https://json-schema.org/draft/2019-09/release-notes) and [2020-12](https://json-schema.org/draft/2020-12/release-notes) release notes for all changes ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
-- Removed all validation code from the `.fromSamplesheet()` channel factory. The validation is now solely done in the `validateParameters()` function. A custom error message will now be displayed if any error has been encountered during the conversion ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
+- Removed the `fromSamplesheet` channel operator and added a `samplesheetToList` function instead. This function validates the samplesheet and returns a list of it. [#3](https://github.com/nextflow-io/nf-schema/pull/3)
 - Removed the `unique` keyword from the samplesheet schema. You should now use [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) or `uniqueEntries` instead ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
-- Removed the `skip_duplicate_check` option from the `fromSamplesheet()` channel factory and the `--validationSkipDuplicateCheck` parameter. You should now use the `uniqueEntries` or [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) keywords in the schema instead ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
-- `.fromSamplesheet()` now is a channel operator instead of a channel factory. It takes one required argument which can either be a string containing the relative path to the schema or a file object of the schema [#3](https://github.com/nextflow-io/nf-schema/pull/3)
-- `.fromSamplesheet()` now does dynamic typecasting instead of using the `type` fields in the JSON schema. This is done due to the complexity of `draft-2020-12` JSON schemas. This should not have that much impact but keep in mind that some types can be different between this and earlier versions because of this ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
-- `.fromSamplesheet()` will now set all missing values as `[]` instead of the type specific defaults (because of the changes in the previous point). This should not change that much as this will also result in `false` when used in conditions. ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
+- Removed the `skip_duplicate_check` option from the `samplesheetToList()` function and the `--validationSkipDuplicateCheck` parameter. You should now use the `uniqueEntries` or [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) keywords in the schema instead ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
+- `samplesheetToList()` now does dynamic typecasting instead of using the `type` fields in the JSON schema. This is done due to the complexity of `draft-2020-12` JSON schemas. This should not have that much impact but keep in mind that some types can be different between this version and older versions in nf-validation ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
+- `samplesheetToList()` will now set all missing values as `[]` instead of the type specific defaults (because of the changes in the previous point). This should not change that much as this will also result in `false` when used in conditions. ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
 
 ## Improvements
 

diff --git a/README.md b/README.md
@@ -36,7 +36,7 @@ This is all that is needed - Nextflow will automatically fetch the plugin code a
 You can now include the plugin helper functions into your Nextflow pipeline:
 
 ```groovy title="main.nf"
-include { validateParameters; paramsHelp; paramsSummaryLog; fromSamplesheet } from 'plugin/nf-schema'
+include { validateParameters; paramsHelp; paramsSummaryLog; samplesheetToList } from 'plugin/nf-schema'
 
 // Print help message, supply typical command line usage for the pipeline
 if (params.help) {
@@ -51,7 +51,7 @@ validateParameters()
 log.info paramsSummaryLog(workflow)
 
 // Create a new channel of metadata from a sample sheet passed to the pipeline through the --input parameter
-ch_input = Channel.of(params.input).fromSamplesheet("assets/schema_input.json")
+ch_input = Channel.fromList(samplesheetToList(params.input, "assets/schema_input.json"))
 ```
 
 ## Dependencies
@@ -61,7 +61,7 @@ ch_input = Channel.of(params.input).fromSamplesheet("assets/schema_input.json")
 
 ## Slack channel
 
-There is a dedicated [nf-validation Slack channel](https://nfcore.slack.com/archives/C056RQB10LU) in the [Nextflow Slack workspace](https://nextflow.slack.com).
+There is a dedicated [nf-schema Slack channel](https://nfcore.slack.com/archives/C056RQB10LU) in the [Nextflow Slack workspace](https://nextflow.slack.com).
 
 ## Credits
 

diff --git a/docs/migration_guide.md b/docs/migration_guide.md
@@ -14,7 +14,7 @@ This guide is intended to help you migrate your pipeline from [nf-validation](ht
 Following list shows the major breaking changes introduced in nf-schema:
 
 1. The JSON schema draft has been updated from `draft-07` to `draft-2020-12`. See [JSON Schema draft 2020-12 release notes](https://json-schema.org/draft/2020-12/release-notes) and [JSON schema draft 2019-09 release notes](https://json-schema.org/draft/2019-09/release-notes) for more information.
-2. The `fromSamplesheet` channel factory has been converted to a channel operator. See [updating `fromSamplesheet`](#updating-fromsamplesheet) for more information.
+2. The `fromSamplesheet` channel factory has been converted to a function called `samplesheetToList`. See [updating `fromSamplesheet`](#updating-fromsamplesheet) for more information.
 3. The `unique` keyword for samplesheet schemas has been removed. Please use [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) or [`uniqueEntries`](nextflow_schema/nextflow_schema_specification.md#uniqueentries) now instead.
 4. The `dependentRequired` keyword now works as it's supposed to work in JSON schema. See [`dependentRequired`](https://json-schema.org/understanding-json-schema/reference/conditionals#dependentRequired) for more information
 
@@ -35,18 +35,20 @@ This will replace the old schema draft specification (`draft-07`) by the new one
     Repeat this command for every JSON schema you use in your pipeline. e.g. for the default samplesheet schema in nf-core pipelines:
     `bash sed -i -e 's/http:\/\/json-schema.org\/draft-07\/schema/https:\/\/json-schema.org\/draft\/2020-12\/schema/g' -e 's/definitions/defs/g' assets/schema_input.json `
 
-Next you should update the `.fromSamplesheet` channel factory to the channel operator. Following tabs shows the difference between the versions:
+Next you should update the `.fromSamplesheet` channel factory to the `samplesheetToList` function. Following tabs shows the difference between the versions:
 
 === "nf-validation"
 
     ```groovy
+    include { fromSamplesheet } from 'plugin/nf-validation'
     Channel.fromSamplesheet("input")
     ```
 
 === "nf-schema"
 
     ```groovy
-    Channel.of(params.input).fromSamplesheet("path/to/samplesheet/schema")
+    include { samplesheetToList } from 'plugin/nf-schema'
+    Channel.fromList(samplesheetToList(params.input, "path/to/samplesheet/schema"))
     ```
 
 !!! note

diff --git a/docs/nextflow_schema/sample_sheet_schema_specification.md b/docs/nextflow_schema/sample_sheet_schema_specification.md
@@ -59,7 +59,7 @@ Fields that are present in the sample sheet, but not in the schema will be ignor
 !!! warning
 
     The order of properties in the _schema_ **is** important.
-    This order defines the order of output channel properties when using the `fromSamplesheet` channel factory.
+    This order defines the order of output channel properties when using the `samplesheetToList()` function.
 
 ## Common keys
 
@@ -68,12 +68,6 @@ For example: `type`, `pattern`, `format`, `errorMessage`, `exists` and so on.
 
 Please refer to the [Nextflow schema specification](../nextflow_schema/nextflow_schema_specification.md) docs for details.
 
-!!! tip
-
-    Sample sheets are commonly used to define input file paths.
-    Be sure to set `"type": "string"`, `exists: true`, `"format": "file-path"` and `"schema":"path/to/samplesheet/schema.json"` for these properties,
-    so that samplesheets are correctly validated and `fromSamplesheet` does not result in any errors.
-
 ## Sample sheet keys
 
 Below are the properties that are specific to sample sheet schema.

diff --git a/docs/samplesheets/examples.md b/docs/samplesheets/examples.md
@@ -7,7 +7,7 @@ description: Examples of advanced sample sheet creation techniques.
 
 ## Introduction
 
-Understanding channel structure and manipulation is critical for getting the most out of Nextflow. nf-schema helps initialise your channels from the text inputs to get you started, but further work might be required to fit your exact use case. In this page we run through some common cases for transforming the output of `.fromSamplesheet()`.
+Understanding channel structure and manipulation is critical for getting the most out of Nextflow. nf-schema helps initialise your channels from the text inputs to get you started, but further work might be required to fit your exact use case. In this page we run through some common cases for transforming the output of `samplesheetToList()`.
 
 ### Glossary
 
@@ -17,15 +17,15 @@ Understanding channel structure and manipulation is critical for getting the mos
 
 ## Default mode
 
-Each item in the channel emitted by `.fromSamplesheet()` is a tuple, corresponding with each row of the sample sheet. Each item will be composed of a meta value (if present) and any additional elements from columns in the sample sheet, e.g.:
+Each item in the list emitted by `samplesheetToList()` is a tuple, corresponding with each row of the sample sheet. Each item will be composed of a meta value (if present) and any additional elements from columns in the sample sheet, e.g.:
 
 ```csv
 sample,fastq_1,fastq_2,bed
 sample1,fastq1.R1.fq.gz,fastq1.R2.fq.gz,sample1.bed
 sample2,fastq2.R1.fq.gz,fastq2.R2.fq.gz,
 ```
 
-Might create a channel where each element consists of 4 items, a map value followed by three files:
+Might create a list where each element consists of 4 items, a map value followed by three files:
 
 ```groovy
 // Columns:
@@ -36,13 +36,13 @@ Might create a channel where each element consists of 4 items, a map value follo
 [ [ id: "sample2" ], fastq2.R1.fq.gz, fastq2.R2.fq.gz, [] ] // A missing value from the sample sheet is an empty list
 ```
 
-This channel can be used as input of a process where the input declaration is:
+This list can be converted to a channel that can be used as input of a process where the input declaration is:
 
 ```nextflow
 tuple val(meta), path(fastq_1), path(fastq_2), path(bed)
 ```
 
-It may be necessary to manipulate this channel to fit your process inputs. For more documentation, check out the [Nextflow operator docs](https://www.nextflow.io/docs/latest/operator.html), however here are some common use cases with `.fromSamplesheet()`.
+It may be necessary to manipulate this channel to fit your process inputs. For more documentation, check out the [Nextflow operator docs](https://www.nextflow.io/docs/latest/operator.html), however here are some common use cases with `samplesheetToList()`.
 
 ## Using a sample sheet with no headers
 
@@ -73,7 +73,7 @@ or this YAML file:
 - test_2
 ```
 
-The output of `.fromSamplesheet()` will look like this:
+The output of `samplesheetToList()` will look like this:
 
 ```bash
 test_1
@@ -82,7 +82,7 @@ test_2
 
 ## Changing the structure of channel items
 
-Each item in the channel will be a tuple, but some processes will use multiple files as a list in their input channel, this is common in nf-core modules. For example, consider the following input declaration in a process, where FASTQ could be > 1 file:
+Each item in the list will be a tuple, but some processes will use multiple files as a list in their input channel, this is common in nf-core modules. For example, consider the following input declaration in a process, where FASTQ could be > 1 file:
 
 ```groovy
 process ZCAT_FASTQS {
@@ -95,7 +95,7 @@ process ZCAT_FASTQS {
 }
 ```
 
-The output of `.fromSamplesheet()` can be used by default with a process with the following input declaration:
+The output of `samplesheetToList()` (converted to a channel) can be used by default with a process with the following input declaration:
 
 ```groovy
 val(meta), path(fastq_1), path(fastq_2)
@@ -104,7 +104,7 @@ val(meta), path(fastq_1), path(fastq_2)
 To manipulate each item within a channel, you should use the [Nextflow `.map()` operator](https://www.nextflow.io/docs/latest/operator.html#map). This will apply a function to each element of the channel in turn. Here, we convert the flat tuple into a tuple composed of a meta and a list of FASTQ files:
 
 ```groovy
-Channel.of(params.input).fromSamplesheet("path/to/json/schema")
+Channel.fromList(samplesheetToList(params.input, "path/to/json/schema"))
     .map { meta, fastq_1, fastq_2 -> tuple(meta, [ fastq_1, fastq_2 ]) }
     .set { input }
 
@@ -122,7 +122,7 @@ ZCAT_FASTQS(input)
 For example, to remove the BED file from the channel created above, we could not return it from the map. Note the absence of the `bed` item in the return of the closure below:
 
 ```groovy
-Channel.of(params.input).fromSamplesheet("path/to/json/schema")
+Channel.fromList(samplesheetToList(params.input, "path/to/json/schema"))
     .map { meta, fastq_1, fastq_2, bed -> tuple(meta, fastq_1, fastq_2) }
     .set { input }
 
@@ -136,7 +136,7 @@ In this way you can drop items from a channel.
 We could perform this twice to create one channel containing the FASTQs and one containing the BED files, however Nextflow has a native operator to separate channels called [`.multiMap()`](https://www.nextflow.io/docs/latest/operator.html#multimap). Here, we separate the FASTQs and BEDs into two separate channels using `multiMap`. Note, the channels are both contained in `input` and accessed as an attribute using dot notation:
 
 ```groovy
-Channel.of(params.input).fromSamplesheet("path/to/json/schema")
+Channel.fromList(samplesheetToList(params.input, "path/to/json/schema"))
     .multiMap { meta, fastq_1, fastq_2, bed ->
         fastq: tuple(meta, fastq_1, fastq_2)
         bed:   tuple(meta, bed)
@@ -163,7 +163,7 @@ This example shows a channel which can have entries for WES or WGS data. WES dat
 // Channel with four elements - see docs for examples
 params.input = "samplesheet.csv"
 
-Channel.of(params.input).fromSamplesheet("path/to/json/schema")
+Channel.fromList(samplesheetToList(params.input, "path/to/json/schema"))
     .branch { meta, fastq_1, fastq_2, bed ->
         // If BED does not exist
         WGS: !bed
@@ -178,13 +178,13 @@ input.WGS.view() // Channel has 3 elements: meta, fastq_1, fastq_2
 input.WES.view() // Channel has 4 elements: meta, fastq_1, fastq_2, bed
 ```
 
-Unlike `multiMap`, the outputs of `.branch()`, the resulting channels will contain a different number of items.
+Unlike `.multiMap()`, the outputs of `.branch()` will contain a different number of items.
 
 ## Combining a channel
 
 After splitting the channel, it may be necessary to rejoin the channel. There are many ways to join a channel, but here we will demonstrate the simplest which uses the [Nextflow join operator](https://www.nextflow.io/docs/latest/operator.html#join) to rejoin any of the channels from above based on the first element in each item, the `meta` value.
 
-```nextflow
+```groovy
 input.fastq.view() // Channel has 3 elements: meta, fastq_1, fastq_2
 input.bed.view()   // Channel has 2 elements: meta, bed
 
@@ -204,14 +204,14 @@ It's useful to determine the count of channel entries with similar values when y
 This example contains a channel where multiple samples can be in the same family. Later on in the pipeline we want to merge the analyzed files so one file gets created for each family. The result will be a channel with an extra meta field containing the count of channel entries with the same family name.
 
 ```groovy
-// channel created by fromSamplesheet() previous to modification:
+// channel created with samplesheetToList() previous to modification:
 // [[id:example1, family:family1], example1.txt]
 // [[id:example2, family:family1], example2.txt]
 // [[id:example3, family:family2], example3.txt]
 
 params.input = "sample sheet.csv"
 
-Channel.of(params.input).fromSamplesheet("path/to/json/schema")
+Channel.fromList(samplesheetToList(params.input, "path/to/json/schema"))
     .tap { ch_raw }                       // Create a copy of the original channel
     .map { meta, txt -> [ meta.family ] } // Isolate the value to count on
     .reduce([:]) { counts, family ->      // Creates a map like this: [family1:2, family2:1]

diff --git a/docs/samplesheets/fromSamplesheet.md b/docs/samplesheets/fromSamplesheet.md