Creating a nf script to run a pipeline in Azure VM

Hello! I’m trying to create a nf script that would run a pipeline from nextflow so that the outputs could then be passed onto a new pipeline.

I currently have this:

params.sample_sheet_file = "az://container/Reference/output.csv"
params.outdir = "az://container/PipelineRuns/Run_with_nf_script/"
params.genome = "GRCh37"
params.email = "random@random.com"

// Define the process
process RunPipeline {

    // Define the input parameters for the process
    input: 
    path params.sample_sheet_file

    // Define the output files produced by the process
    output: 
    path params.outdir

    // Define the script to be executed by the process
    script:
    """
    nextflow -log ${params.outdir}/nextflow.log run nf-core/rnaseq \\
        --input ${sample_sheet_file} \\
        --outdir ${params.outdir} \\
        --genome ${params.genome} \\
        -profile docker \\
        -w ${params.outdir}/work \\
        --max_memory 128.GB \\
        -N ${params.email}
    """
}

// Define the workflow
workflow {
    // Run the process in the workflow
    RunPipeline
}

But the log file is telling me this:

May-13 15:59:46.550 [main] DEBUG nextflow.Session - Work-dir: /home/azureuser/nf_scripts/work [ext2/ext3]
May-13 15:59:46.562 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
May-13 15:59:46.571 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
May-13 15:59:46.586 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
May-13 15:59:46.594 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 21; maxThreads: 1000
May-13 15:59:46.655 [main] DEBUG nextflow.Session - Session start
May-13 15:59:46.851 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
May-13 15:59:46.880 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: RunPipeline
May-13 15:59:46.880 [main] DEBUG nextflow.Session - Igniting dataflow network (0)
May-13 15:59:46.880 [main] DEBUG nextflow.script.ScriptRunner - Parsed script files:
  Script_5b4f2239e787c79d: /home/azureuser/nf_scripts/rnaseq.nf
May-13 15:59:46.880 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination 
May-13 15:59:46.880 [main] DEBUG nextflow.Session - Session await
May-13 15:59:46.880 [main] DEBUG nextflow.Session - Session await > all processes finished
May-13 15:59:46.880 [main] DEBUG nextflow.Session - Session await > all barriers passed
May-13 15:59:46.893 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=0; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=0ms; failedDuration=0ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=0; peakCpus=0; peakMemory=0; ]
May-13 15:59:47.078 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
May-13 15:59:47.096 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'FileTransfer' shutdown completed (hard=false)
May-13 15:59:47.096 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

I can’t really understand what exactly is telling me. Is there something wrong with the script? Or is it that the nextflow script isn’t being able to be run on the VM? Because I can see that the specs of the VM are being loaded

May-13 15:59:46.534 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 23.10.1 build 5891
  Created: 12-01-2024 22:01 UTC 
  System: Linux ...-azure
  Runtime: Groovy 3.0.19 ....
  Encoding: UTF-8 (UTF-8)
  Process: 95784@user [10.0.0.4]
  CPUs: 20 - Mem: 157.3 GB (154.8 GB) - Swap: 0 (0)

Nextflow in Nextflow eh? You’re brave.

This has to be done very carefully. You might cause all sorts of problems. I’ve updated your pipeline with the following changes.

  1. Move the inputs into a correct input: block so that it stages files before running.
  2. Similarly, write the outputs to the the local directory and use Nextflow to stage them out.
  3. Use vals in the input block so things can be controlled better.

I’d be tempted to add an additional path config input so you can add additional config for the ‘full’ pipeline.

params.sample_sheet_file = "test.csv"
params.revision          = "master"
params.outdir            = "results"
params.genome            = "GRCh37"
params.work              = "work"

// Define the process
process RunPipeline {

    publishDir "${params.outdir}"

    // Define the input parameters for the process
    input: 
    path sample_sheet_file
    val revision
    val genome
    val work

    // Define the output files produced by the process
    output: 
    path "results"

    // Define the script to be executed by the process
    script:
    """
    mkdir -p results
    nextflow -log results/nextflow.log run nf-core/rnaseq \\
        -r ${revision} \\
        --input ${sample_sheet_file} \\
        --outdir results \\
        --genome ${genome} \\
        -profile docker \\
        -w ${work}
    """
}

// Define the workflow
workflow {
    ch_input = Channel.fromPath(params.sample_sheet_file, checkIfExists: true, type: "file")
    // Run the process in the workflow
    RunPipeline(ch_input, params.revision, params.genome, params.work)
}
1 Like

Thank you Adam! In the meantime I’ve come across the concepts of channels and how workflows worked and I ended up removing the input block to use the params directly (that’s why it was not fetching the files)

Regarding using nextflow inside nextflow feel free to call it being brave or being naive… But nextflow sure is a nice tool to use and I’m eager to learn it in detail :grin:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.