Use function inside the `exec` block of a process

Hi, I have a process where I am using the exec block so that its implementation is in groovy; I also have a user defined function (parseJson) in my workflow. When I use parseJson in the exec block, the path to the file is passed as absolute relative to the workDir and so it fails because it should be using the staged path to the file.

nextflow.enable.dsl = 2

import groovy.json.JsonSlurper;


def parseJson(json_file) {
    def parser = new JsonSlurper()
    String raw_text = file(json_file).text
    def json_obj = parser.parseText(raw_text)
    return json_obj
}


process align {
    output:
        path "sample_metrics.json"
    script:
    """
    echo '{"metric-u": 2.7182}' > sample_metrics.json
    """
}


process applyQcThreshold {
    debug true
    
    input:
        path metrics
        val metric_name
        val thresholds

    exec:
        println thresholds
        println metrics
        metric = parseJson(metrics)
        return metric[metric_name] < thresholds["threshold"]
}

workflow {
    main:
        qc_thresholds = parseJson(params.qc_thresholds)
        log.info ">>>> $qc_thresholds"
        
        align_output = align()
        align_output.view { log.info "$it" }
        applyQcThreshold(align_output, "metric-u", qc_thresholds['tumour_germline'])
}

the thresholds.json file:

{
  "tumour_germline": {
    "metric": "metric-u",
    "threshold": 3.14159262,
  },
}

The output I get (using nextflow 24.10.2):

(base) pablo@laptop nextflow-questions % nextflow run apply-qc-thresholds-01.nf --qc_thresholds="thresholds.json"
Nextflow 24.10.4 is available - Please consider updating your version to it

 N E X T F L O W   ~  version 24.10.2

Launching `apply-qc-thresholds-01.nf` [romantic_kalman] DSL2 - revision: 1aa7631aa2

>>>> [tumour_germline:[metric:metric-u, threshold:3.14159262]]
executor >  local (1)
[b6/cdf274] process > align            [100%] 1 of 1 âś”
[-        ] process > applyQcThreshold -
/Users/pablo/dev/sandbox/nextflow-questions/work/b6/cdf274f4e012a27986b82cddd6de64/sample_metrics.json
ERROR ~ Error executing process > 'applyQcThreshold'

Caused by:
  No such file or directory: /Users/pablo/dev/sandbox/nextflow-questions/sample_metrics.json


Source block:
  metric = parseJson(metrics)

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

What do i need to do to make parseJson work inside and outside of a process?

Thanks in advance

Hello @pablo-esteban ! Welcome to Seqera Community Forum :slight_smile:

The first thing I want to bring up is that when you want your task to run a script language such as Python, R or Groovy, you do this through a shebang. See the example below for Groovy. code:

process groovyTask {                                                            
  debug true                                                                    
                                                                                
  script:                                                                       
  """                                                                           
  #!/usr/bin/env groovy                                                         
                                                                                
  // Your Groovy code here                                                      
  println "Hello from Groovy!"                                                  
  def list = [1, 2, 3, 4, 5]                                                    
  println list.sum()                                                                                                                                         
  """                                                                           
}                                                                               
                                                                                
workflow {                                                                      
  groovyTask()                                                                  
} 

The exec block has a different purpose. It executes the given code without launching a job, I believe that’s why you’re running into issues for paths. Could you try the approach I’m suggesting and see if it fixes your problem?

Something to understand about native processes is that they don’t stage files, so they shouldn’t have path type inputs. Instead you pass them as val type, and make sure the input is a Path class object.

See:

Where params_file is a file input for example

1 Like

Hi Marcel, thank you for replying and for your suggestion.

I’ve tried it and it partially solves my problem: i can now access the staged file.

However, because it the process now runs in its own isolated runtime (i.e. the groovy docker image), I don’t have access to the parseJson function I created for use in nextflow. It also introduces new challenges as val inputs get embedded in the text of the script, rather than being actual variables, like in exec or outside the triple-quotes.

I have a feeling I should be trying a completely different approach to enforcing qc thresholds on process outputs.

I see. Did you try what @mahesh.binzerpanchal proposed above? It seems to me it may solve your problem.

1 Like