R docker issue

Hello! I’m trying to run an R script via a docker container using a very simple nextflow process, but keep coming up against this error:

I think there’s an issue with how the folders are mounted into the container, as I can run the docker interactively within the nextflow work dir and it runs correctly.

Any help would be amazing!

Error:

Command executed:

Rscript /app/R_extraction.R --base_dir processed_audio --window 40 --output_dir .

Command exit status:
1

Command output:
(empty)

Command error:
Error: unexpected input in “”
Execution halted

*Work dir:
/mnt/c/holly/nextflowdev/work/94/5dd9c5805bb0a91f17177e8cc00ed4

this is my main.nf:
#!/usr/bin/env nextflow

nextflow.enable.dsl=2

params.input = ‘/mnt/c/holly/nextflowdev/test_files’

params.outdir = ‘/mnt/c/holly/nextflowdev/processed_audio’

workflow {

// wrap folder as a single path object

input_ch = Channel.of(file(params.input))



preproc_ch = PREPROCESS_AUDIO(input_ch)



EXTRACT_R_FEATURES(preproc_ch)

}

process PREPROCESS_AUDIO {

container 'speak_proj/audio-preprocessing:latest'

publishDir "${params.outdir}", mode: 'copy'



input:

path audio_files



output:

path "processed_audio"



script:

"""

mkdir -p processed_audio

python3 /app/py_preprocessing.py --input ${audio_files} --outdir processed_audio

"""

}

process EXTRACT_R_FEATURES {

container 'speak_proj/r-extraction:latest'

publishDir "${params.outdir}/R_features", mode: 'copy'



input:

path processed_audio



output:

path "R_extracted_features.csv"

path "extracted_features.Rdata"

path "features"



script:

"""

Rscript /app/R_extraction.R --base_dir ${processed_audio} --window 40 --output_dir .



"""

}

I think the issue is that the /app/ directory is not mounted within your container.

In terms of best practices though, scripts should be put in a bin/ folder in the root of your workflow. They should be made executable, and have the shebang directive at the top that tells the executor which interpreter to use.

script:  
"""  
mkdir -p processed_audio  
py_preprocessing.py --input ${audio_files} --outdir processed_audio  
"""

where py_preprocessing.py looks like:

#! /usr/bin/env python 

...
rest of python script

and similarly with your R process

#! /usr/bin/env Rscript

...
rest of R script

and call it

script:
"""
R_extraction.R \\
  --base_dir ${processed_audio} \\
  --window 40 \\
  --output_dir .
"""

Thank you!

I changed the scripts to pull from /bin, and they have the shebang now.

I keep getting the same error in the R docker, although I tried a similar process with a python script and that one worked ok:

// ----------------------

// R feature extraction

// ----------------------

process EXTRACT_R_FEATURES {

container 'speak_proj/r-extraction:latest'



input:

path processed_audio



output:

path "R_extracted_features.csv"

path "extracted_features.Rdata"

path "features"



script:

"""

Rscript ${params.r_extraction} --base_dir ${processed_audio} --window 40 --output_dir .

"""

}

executor > local (1)
[1a/2d03d4] PREPROCESS_AUDIO (1) [100%] 1 of 1, cached: 1 :check_mark:
[91/b6596e] EXTRACT_PY_FEATURES (1) [100%] 1 of 1, cached: 1 :check_mark:
[47/940634] EXTRACT_R_FEATURES (1) [ 0%] 0 of 1 ✘
ERROR ~ Error executing process > ‘EXTRACT_R_FEATURES (1)’

Caused by:
Process EXTRACT_R_FEATURES (1) terminated with an error exit status (1)

Command executed:

Rscript /mnt/c/holly/nextflowdev/bin/R_extraction.R --base_dir processed_audio --window 40 --output_dir .

Command exit status:
1

Command output:
(empty)

Command error:
Error: unexpected input in “”
Execution halted

Work dir:
/mnt/c/holly/nextflowdev/work/47/940634116c43fb576d614bb380b9e4

Container:
speak_proj/r-extraction:latest

Tip: you can try to figure out what’s wrong by changing to the process work dir and showing the script file named .command.sh

– Check ‘.nextflow.log’ file for details

The script works fine in docker when i mount it explicitly, but the py script works natively in nextflow following the same method: // ----------------------

// Python feature extraction

// ----------------------

process EXTRACT_PY_FEATURES {

publishDir "${params.outdir}/python_features", mode: 'copy'



input:

path processed_audio



output:

path "features"   



script:

"""

python3 ${params.py_extraction} --input_dir ${processed_audio} --output_dir features

"""

}

The underlying issue is still the same. The directories aren’t mounted to docker container with the -v option so they’re not visible when you run the nextflow process. The directory which isn’t mounted in this case is /mnt now.

Rscript /mnt/c/holly/nextflowdev/bin/R_extraction.R

This is also because you’re passing the script name as a parameter. This isn’t portable as you’re discovering.

One shouldn’t use params inside processes generally ( the notable exception is with publishDir but that will be replaced in future ). params and such should be passed into the process call, and then accessed within the process from the input: variables.

workflow {
    CUSTOM_PROCESS( params.my_file )
}

...

process CUSTOM_PROCESS {
    input:
    path some_file

...
}

However for scripts which are in the bin/ folder of your workflow, you would simply call them as a command in the script:

Rscript /mnt/c/holly/nextflowdev/bin/R_extraction.R

should simply be ( no params ):

R_extraction.R

and have executable permission (chmod a+x bin/R_extraction.R).

The underlying concept is that all files a process uses should be “staged” (symlinked or copied) into the working directory, and scripts are found by extending the PATH env variable by adding the bin/ directory to it before running the process.
I hope that clarifies things a bit more.

The nextflow training material goes into more depth if you’re interested: Working with Files - training.nextflow.io