Dealing with multiple types of sample sheets

Currently I have a working nf-core pipeline that does some basic things on fastq files. The way it works is that data that can come from three different instruments, each one with a different sample sheet, and I have an R script that validates the instrument-specific samplesheet, and from it creates an nf-core friendly sample sheet that would go into the pipeline.

Right now I run the validation/sample sheet creation R program and then the pipeline, but I am exploring integrating the R script into the pipeline. However, this means that the nf-core pipeline would need to be able to receive three different kinds of sample sheets. Is it possible to do that?

Yes, you just need to invoke the nf-validation fromSamplesheet the number of times you need (and have of course an associated JSON schema), if you want to do it the nf-core way.

An example of this is nf-core/taxprofiler. We have one samplesheet for samples (fastqs and their metadata) and another for databases (database files and their metadata).

2 Likes

We are also working very hard to get nf-schema in the nf-core template which would make this even easier. You can use the new samplesheetToList function there to create a list of each samplesheet. You can then pass these lists to Channel.fromList() to create a channel from these lists. One nice extra feature is that you can pass the samplesheet schema to samplesheetToList which would make it even possible to use a different JSON schema for each type of samplesheet.

1 Like

Thank you @jfy133 and @nvnieuwk !

@jfy133 , I am looking at the code. Sorry, I am relatively new at nf-core, have made several pipelines now but I am not as superfamiliar with the various parts. Basically, the steps would be:

  1. Create the two sample sheet schemas (in the taxprofiler case schema_database.json and schema_input.json
  2. Modify the workflow file such as in here: taxprofiler/workflows/taxprofiler.nf at 5d3ee5513a84f92773c8376c55b5f4da39835307 · nf-core/taxprofiler · GitHub

I could just use a bit more direction!! I don’t understand where you say “you just need to invoke the nf-validation fromSamplesheet the number of times you need”???

@nvnieuwk , do you have an example of using the samplesheeToList function?

To be more helpful (sorry, I responded in a rush at the time!):

Almost!

Yes, those are the two schemas for our two samplesheets, and then

actually the comparison of the two samplesheets versus their respective schemas happen here: taxprofiler/subworkflows/local/utils_nfcore_taxprofiler_pipeline/main.nf at 6204b1ade641661066d148cd2c72e4ecaf94e6c5 · nf-core/taxprofiler · GitHub. This is the place that I meant to invoke the nf-validation function twice :slight_smile:

Which gets emitted from that subworkflow with the samplesheet channel, and pass to the main taxprofiler.nf via here taxprofiler/main.nf at 5d3ee5513a84f92773c8376c55b5f4da39835307 · nf-core/taxprofiler · GitHub (sorry for convoluted nf-core structure, indeed not easy for newcomers)

And then is received as in the link you gave above :), where we do a little bit more of clean up after the validation (mostly converting various things to nf-core syntax).

I hope that’s slightly more useful this time!

Note that @nvnieuwk 's newer plugin is more powerful but not yet implemented within nf-core (I’m referring to the older version of the plugin)

Hi yes don’t use nf-schema yet if you are working on an nf-core pipeline, but you can fully use it in all other pipelines.

You can find the documentation here: nf-schema - nf-schema

samplesheetToList documentation can be found here: Create a channel - nf-schema

And you can find an example here: nf-schema/examples/samplesheetToListBasic/pipeline/main.nf at 928b4162c0c56fbba080ba7ddabb4b957bbc0107 · nextflow-io/nf-schema · GitHub

Please let me know if you need more help :slight_smile: