Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove .fromSamplesheet, but create an equivalent function samplesheetToList #3

Merged
merged 23 commits into from
Apr 18, 2024

Conversation

nvnieuwk
Copy link
Collaborator

As suggested on Slack, the .fromSamplesheet channel factory has now been converted to an operator to make it possible to use files created by other pipelines in a meta pipeline. A function has also been added that returns a list of the samplesheet contents.

@nvnieuwk nvnieuwk marked this pull request as draft March 21, 2024 09:56
@nvnieuwk
Copy link
Collaborator Author

TODO change target branch to master once #1 has been merged

@bentsherman
Copy link
Member

Seems like the right idea 👍

@nvnieuwk nvnieuwk changed the base branch from rebrand-to-nf-schema to master April 11, 2024 08:12
@nvnieuwk
Copy link
Collaborator Author

This PR is done now! Waiting for #1 to be merged before setting this ready to review

@nvnieuwk nvnieuwk marked this pull request as ready for review April 11, 2024 14:16
@nvnieuwk nvnieuwk requested review from ewels, mirpedrol and awgymer April 12, 2024 08:21
Copy link
Collaborator

@awgymer awgymer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@nvnieuwk
Copy link
Collaborator Author

@mirpedrol can you also take a look at this please? (since it's a pretty big change)

Copy link
Collaborator

@mirpedrol mirpedrol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
I haven't tested it with a minimal example, ok for me to merge if you could test it with, for example, a fresh nf-core template :)

CHANGELOG.md Outdated Show resolved Hide resolved
Comment on lines 29 to 32
!!! warning

The `.fromSamplesheet` channel operator and `samplesheetToList` also validate the files before converting them. If you convert the samplesheet, you should not add a schema to the parameter corresponding to the samplesheet to keep your pipeline as efficient as possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't add this warning, it's better to validate the sample sheet before starting the pipeline execution, even if it is then validated twice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay if I change it to a note instead, I think it's useful information

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

stdout.contains("[[string1:extraField, string2:extraField, integer1:10, integer2:10, boolean1:true, boolean2:true], string1, 25, false, ${getRootString()}/src/testResources/test.txt, ${getRootString()}/src/testResources/testDir, ${getRootString()}/src/testResources/testDir, unique3, 1, itDoesExist]" as String)
}

def 'samplesheetToList - String, Path' () {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add the same test for "path, path"?

@nvnieuwk
Copy link
Collaborator Author

nvnieuwk commented Apr 16, 2024

Hi all, I'm having some weird error when trying out the this branch on the nf-core test pipeline:

java.lang.IllegalStateException: Operator 'fromSamplesheet' conflict - it's defined by plugin nf-schema and nf-schema
	at nextflow.plugin.extension.PluginExtensionProvider.loadPluginExtensionMethods(PluginExtensionProvider.groovy:138)
	at nextflow.plugin.extension.PluginExtensionProvider.loadPluginExtensionMethods(PluginExtensionProvider.groovy:117)
	at nextflow.script.IncludeDef.loadPlugin0(IncludeDef.groovy:210)
	at nextflow.script.IncludeDef.load0(IncludeDef.groovy:106)
	at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
	at Script_d399f6c5d563f9ab.runScript(Script_d399f6c5d563f9ab:13)
	at org.codehaus.groovy.vmplugin.v8.IndyInterface.selectMethod(IndyInterface.java:355)
	at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
	at nextflow.script.BaseScript.run0(BaseScript.groovy:141)
	at org.codehaus.groovy.vmplugin.v8.IndyInterface.selectMethod(IndyInterface.java:355)
	at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
	at nextflow.script.BaseScript.run(BaseScript.groovy:189)
	at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:236)
	at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:222)
	at nextflow.script.IncludeDef.memoizedMethodPriv$loadModule0PathMapSession(IncludeDef.groovy:140)
	at nextflow.script.IncludeDef.access$0(IncludeDef.groovy)
	at nextflow.script.IncludeDef$__clinit__closure2.doCall(IncludeDef.groovy)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:279)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1007)
	at groovy.lang.Closure.call(Closure.java:433)
	at org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.lambda$call$0(Memoize.java:137)
	at org.codehaus.groovy.runtime.memoize.ConcurrentCommonCache.getAndPut(ConcurrentCommonCache.java:137)
	at org.codehaus.groovy.runtime.memoize.ConcurrentCommonCache.getAndPut(ConcurrentCommonCache.java:113)
	at org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.call(Memoize.java:136)
	at nextflow.script.IncludeDef.loadModule0(IncludeDef.groovy)
	at nextflow.script.IncludeDef.load0(IncludeDef.groovy:112)
	at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
	at Script_d3a7893576054a4c.runScript(Script_d3a7893576054a4c:21)
	at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
	at nextflow.script.BaseScript.run0(BaseScript.groovy:141)
	at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
	at nextflow.script.BaseScript.run(BaseScript.groovy:189)
	at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:236)
	at nextflow.script.ScriptRunner.run(ScriptRunner.groovy:242)
	at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:137)
	at nextflow.cli.CmdRun.run(CmdRun.groovy:368)
	at nextflow.cli.Launcher.run(Launcher.groovy:500)
	at nextflow.cli.Launcher.main(Launcher.groovy:672)

You can see the adjusted code for the test pipeline here: https://github.com/nvnieuwk/testpipeline/tree/nf-schema-plugin

I've no idea left on what to try (I've deleted all cached data I know of, I've updated the nextflow version to the latest edge version...)

@human9
Copy link

human9 commented Apr 17, 2024

I've no idea left on what to try (I've deleted all cached data I know of, I've updated the nextflow version to the latest edge version...)

I think the edge version might be the issue! I can reproduce the same error with nextflow version 24.03.0-edge, but if I use 23.10.1 it works fine.

@nvnieuwk
Copy link
Collaborator Author

I think the edge version might be the issue! I can reproduce the same error with nextflow version 24.03.0-edge, but if I use 23.10.1 it works fine.

I've also tried it with 23.10.1 but it still fails for me :/

N E X T F L O W  ~  version 23.10.1
Launching `main.nf` [hungry_pesquet] DSL2 - revision: d17b498066
ERROR ~ Operator 'fromSamplesheet' conflict - it's defined by plugin nf-schema and nf-schema

 -- Check script './workflows/../subworkflows/local/utils_nfcore_testpipeline_pipeline/main.nf' at line: 13 or see '.nextflow.log' file for more details

@nvnieuwk
Copy link
Collaborator Author

It seems these lines in the Nextflow source code are causing the issue. If I printed out existing.getClass().getName() I got nextflow.plugin.extension.PluginExtensionMethod and ext.getClass().getName() resulted in nextflow.validation.SchemaValidator. The execution of the plugin worked perfectly when I commented these lines out...

@bentsherman do you have any ideas on what could be going wrong here? Is it an issue on the plugin side or could this be a bug in Nextflow?

@nvnieuwk
Copy link
Collaborator Author

nvnieuwk commented Apr 17, 2024

After some more testing I noticed that loadPluginExtensionMethods is being executed twice, which causes the error because fromSamplesheet is already loaded the second time.

@human9
Copy link

human9 commented Apr 17, 2024

I've also tried it with 23.10.1 but it still fails for me :/

Ah, I was not using the properly adjusted testpipeline.. I also get the same error on 23.10.1 with the fully adjusted code.

In my edit I had manually swapped nf-validation for nf-schema in the nextflow.config, and also where it gets imported in the subworkflows. But I missed workflows/testpipeline.nf. When I also swap import in that file (which is for paramsSummaryMap, not even fromSamplesheet), I again get the same error.. Seems like some inconsistency in nextflow's plugin import process 🤔

@nvnieuwk
Copy link
Collaborator Author

Yeah some weird stuff is going on 😁 This needs to be fixed first though! Thanks for also going through this and confirming it's not just me

@bentsherman
Copy link
Member

I thought it might have something to do with loading an operator in a module, but I couldn't reproduce any error. If you can show me a minimal reproducible example, I can look into it

@nvnieuwk
Copy link
Collaborator Author

I have the error in this branch of the nf-core template test pipeline: https://github.com/nvnieuwk/testpipeline/tree/nf-schema-plugin. If you try it out with this branch, you should get the error (at least @human9 and I got it by testing it this way)

@nvnieuwk
Copy link
Collaborator Author

I've so far been unable to recreate the error in any other way.

@bentsherman
Copy link
Member

How are you providing the plugin? I see it in the nextflow.config but the version is not pinned, Nextflow will try to download the plugin from the main registry

@human9
Copy link

human9 commented Apr 17, 2024

I was editing nextflow.config and pinning the version, and I used the NXF_PLUGINS_DIR variable to point to a locally built plugin.

I believe I can reproduce a minimal example with the nf-hello plugin. Here's a diff from mainline testpipeline that reproduces for me on nextflow 23.10.1:

diff --git a/subworkflows/local/utils_nfcore_testpipeline_pipeline/main.nf b/subworkflows/local/utils_nfcore_testpipeline_pipeline/main.nf
index dc90ad9..70b775e 100644
--- a/subworkflows/local/utils_nfcore_testpipeline_pipeline/main.nf
+++ b/subworkflows/local/utils_nfcore_testpipeline_pipeline/main.nf
@@ -11,6 +11,7 @@
 include { UTILS_NFVALIDATION_PLUGIN } from '../../nf-core/utils_nfvalidation_plugin'
 include { paramsSummaryMap          } from 'plugin/nf-validation'
 include { fromSamplesheet           } from 'plugin/nf-validation'
+include { goodbye           } from 'plugin/nf-hello'
 include { UTILS_NEXTFLOW_PIPELINE   } from '../../nf-core/utils_nextflow_pipeline'
 include { completionEmail           } from '../../nf-core/utils_nfcore_pipeline'
 include { completionSummary         } from '../../nf-core/utils_nfcore_pipeline'
@@ -41,6 +42,8 @@ workflow PIPELINE_INITIALISATION {

     ch_versions = Channel.empty()

+    channel.of('uh oh').goodbye()
+
     //
     // Print version and exit if required and dump pipeline parameters to JSON file
     //
diff --git a/workflows/testpipeline.nf b/workflows/testpipeline.nf
index 5a1bf8e..5f88a38 100644
--- a/workflows/testpipeline.nf
+++ b/workflows/testpipeline.nf
@@ -7,6 +7,7 @@
 include { FASTQC                 } from '../modules/nf-core/fastqc/main'
 include { MULTIQC                } from '../modules/nf-core/multiqc/main'
 include { paramsSummaryMap       } from 'plugin/nf-validation'
+include { goodbye                } from 'plugin/nf-hello'
 include { paramsSummaryMultiqc   } from '../subworkflows/nf-core/utils_nfcore_pipeline'
 include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline'
 include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_testpipeline_pipeline'
@@ -33,9 +34,1N E X T F L O W  ~  version 23.10.1
Launching `./main.nf` [agitated_boltzmann] DSL2 - revision: d17b498066
ERROR ~ Operator 'goodbye' conflict - it's defined by plugin nf-hello and nf-hello

 -- Check script './workflows/../subworkflows/local/utils_nfcore_testpipeline_pipeline/main.nf' at line: 14 or see '.nextflow.log' file for more details
3 @@ workflow TESTPIPELINE {
     FASTQC (
         ch_samplesheet
     )
+
     ch_multiqc_files = ch_multiqc_files.mix(FASTQC.out.zip.collect{it[1]})
+

output:

N E X T F L O W  ~  version 23.10.1
Launching `./main.nf` [agitated_boltzmann] DSL2 - revision: d17b498066
ERROR ~ Operator 'goodbye' conflict - it's defined by plugin nf-hello and nf-hello

 -- Check script './workflows/../subworkflows/local/utils_nfcore_testpipeline_pipeline/main.nf' at line: 14 or see '.nextflow.log' file for more details

@human9
Copy link

human9 commented Apr 17, 2024

as far as I can tell the key to triggering it is importing the plugin in more than one workflow

@nvnieuwk
Copy link
Collaborator Author

nvnieuwk commented Apr 17, 2024

How are you providing the plugin? I see it in the nextflow.config but the version is not pinned, Nextflow will try to download the plugin from the main registry

I forgot to push the latest changes sorry for that! I installed the plugin like you showed in the podcast with make install and then I ran nextflow run main.nf --outdir results -profile test,docker in the directory of the test pipeline

@human9
Copy link

human9 commented Apr 17, 2024

Even more minimal! https://github.com/human9/nextflow-import-issue

All it requires is the include in two different files.

@nvnieuwk
Copy link
Collaborator Author

Thank you so much @human9 🥳

@bentsherman
Copy link
Member

Thanks @human9 , looks like you found a Nextflow bug.

Interesting because nf-core/rnaseq already includes from nf-validation in multiple modules without any issue. Makes me think the bug happens only with operators

@nvnieuwk while I do intend to squash the bug in Nextflow, I am also wondering if you really need both the fromSamplesheet operator and the samplesheetToList function? I think you could just have a fromSamplesheet function which would be used as follows:

schema = file('schema_input.json')

// equivalent to channel factory
Channel.fromList( fromSamplesheet(path, schema) )

// equivalent to operator
Channel.fromPath('samplesheet.csv') | flatMap { path -> fromSamplesheet(path, schema) }

I don't mean to dodge the operator issue, but it does seem like that would simplify things for you. I'm also trying to discourage people from creating operators that could just be a map/flatMap + regular function.

@nvnieuwk
Copy link
Collaborator Author

That's actually a pretty good idea! Maybe I'll tweak the naming a bit so it's clear in the migration that it's no longer a factory/operator but a function (samplesheetToList might describe it better than fromSamplesheet). That way we can also keep supporting lower Nextflow versions which would be a nice thing for nf-core pipelines.

I'll repost this in the nf-validation slack channel first though to see if everyone is on board with this.

Thanks for the help @bentsherman and @human9. I'll implement the changes now!

@nvnieuwk nvnieuwk changed the title Rework .fromSamplesheet to a channel operator + create an equivalent function samplesheetToList Remove .fromSamplesheet, but create an equivalent function samplesheetToList Apr 18, 2024
@nvnieuwk
Copy link
Collaborator Author

Okay I'll just merge this for now. We can still change some things if stuff comes up later

@nvnieuwk nvnieuwk merged commit 252c714 into master Apr 18, 2024
6 checks passed
@nvnieuwk nvnieuwk deleted the rework-fromSamplesheet branch April 18, 2024 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants