Nextflow pipeline fails on AWS because it can't find executable in Docker container

Hi I am new to Nextflow and AWS batch. I was able to run the example nf-core pipeline on AWS batch no problem. But I am trying to run my pipeline which requires the UCSC util bigWigToBedGraph.

I made a docker container to that installs it in /usr/local/bin:

# Install UCSC utilities including bigWigToBedGraph
RUN wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64.v385/bigWigToBedGraph -O /usr/local/bin/bigWigToBedGraph \
    && chmod +x /usr/local/bin/bigWigToBedGraph

Then I specify the docker container to use in my nextflow config:

process.container = '<account>.dkr.ecr.us-east-1.amazonaws.com/aindap-process-horvath:latest'

I even specify the full path to the executable, which I double checked by runnign the container interactively on EC2:

// Define the process to convert BigWig to BedGraph
process bigWigToBedGraph {
    tag "${bw.baseName}"
    publishDir "${params.output_dir}", mode: 'copy'
    
    
    input:
    path bw

    output:
    path "${bw.baseName}.bedGraph"

    script:
    """
    echo "running bigWigToBedGraph for file: ${bw.baseName}.bedGraph"
    /usr/local/bin/bigWigToBedGraph $bw ${bw.baseName}.bedGraph
    """
}

But I still get an error:

running bigWigToBedGraph for file: HG002_rep1.hg38.pbmm2.combined.bedGraph
Error: bigWigToBedGraph not found. Please make sure it's installed and in your PATH.

I am not really sure what is going on and would appreciate any pointers. It seems specifying the full path in the nextflow file should work.

Have you tried running your container (with the same URI you’re passing to Nextflow) in interactive mode and checking if the binary is there and with correct permissions?

That’s the strange thing , if I run the container interactively, I’m able to confirm the program is in the PATH and it actually works

But when I run via batch the same doesn’t hold in terms of bigWigToBedgraph


Screen Shot 2024-08-28 at 9.01.03 PM

When I print out the PATH variable when the nextflow pipeline runs:

running bigWigToBedGraph for file: HG002_rep1.hg38.pbmm2.combined.bedGraph
//nextflow-bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

It shows /usr/local/bin as part of the path

That’s unusual. Could you try running which bigWigToBedGraph in the Nextflow process? Or perhaps ls -lha /usr/local/bin which might indicate something wrong with the binary. I’m assuming this is all on x86 and not ARM (Apple silicon).

My only other thought is that it could be user permissions, with some difference between the user on AWS Batch and your local Docker environment.

Also, you could try the image community.wave.seqera.io/library/ucsc-bigwigtobedgraph:469--c71de23bcede0988 built from Seqera containers.

Thanks for the response, Adam. I did check and make sure I pulled the x86 binary when I made my docker container:

# Install UCSC utilities including bigWigToBedGraph
RUN wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64.v385/bigWigToBedGraph -O /usr/local/bin/bigWigToBedGraph \
    && chmod +x /usr/local/bin/bigWigToBedGraph

I tried running bigWigToBedGraph from another image specified in my nextflow file:


// Define the process to convert BigWig to BedGraph
process bigWigToBedGraph {
    tag "${bw.baseName}"
    publishDir "${params.output_dir}", mode: 'copy'
    container "community.wave.seqera.io/library/ucsc-bigwigtobedgraph:469--c71de23bcede0988"
    
    input:
    path bw

    output:
    path "${bw.baseName}.bedGraph"

    script:
    """
        echo "running bigWigToBedGraph for file: ${bw.baseName}.bedGraph"
        bigWigToBedGraph $bw ${bw.baseName}.bedGraph
        """
}

But then I get an error


ERROR ~ Error executing process > 'bigWigToBedGraph (HG002_rep1.hg38.pbmm2.combined)'

Caused by:
  Task failed to start - CannotStartContainerError: Error response from daemon: OCI runtime create failed: runc create failed: unable to start container process: exec: "/usr/local/bin/_entrypoint.sh": stat /usr/local/bin/_entrypoint.sh: no such file or directory: unknown

Another idea I was going to try was to pull the executable from my S3 bucket and use it as a param in my config file

params.ucsc_util = "s3://aindap-demo/bigWigToBedGraph"

This is something more fundamentally wrong with your set up. What is your AWS Batch configuration? How did you set the compute environment and queue up? What AWS machine type is the process running on?

I was following along the steps described here to set up my AWS batch. The machine type is AWS Linux with compute environment having
AWSServiceRoleForBatch and ecsInstanceRole attached.

Anyways, I solved my original problem with help from @robsyme on a call. The path to my aws-cli was in /usr/local/bin in my nextflow config. This was mounting /usr/local/bin from the AMI and blocking whatever in my docker container /usr/local/bin (like the bigWigToBedGraph in my docker image)

I updated where my aws-cli was and pointed to a more inncoucrous path in my config and the pipeline ran just fine.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.