Trying to launch pipeline from Seqera platform using private GitHub repository

I added a pipeline to Launchpad, using a private repo. I have added the right fine-grained GitHub credentials to the platform as well.
Aside from the basic features I set Main script to segmentation.nf (rather than the default main.nf) and I’m using an aws profile like (the repo nextflow.config:

resume = true

tower {
    enabled = true
}

docker {
    enabled = true
    runOptions = '--gpus all -u $(id -u):$(id -g)'
}

params {
    input = null
    pattern = ''
    segs = 'fluid,aug'
}

process {
    executor = 'local'
    container = 'ghcr.io/phenopolis/image-analysis:latest'
    withName: RUN_SEG {
        maxForks = 4
    }
}

profiles {
    aws {
        workDir = 's3://lairc-data-test/seqera/work'
        params {
            input = 's3://lairc-data-test/sample'
            pattern = ''
            segs = 'fluid'
        }
        process {
            executor = 'awsbatch'
            queue = 'TowerForge-28CYqnaN5bmBllepvCQYRB'
            scratch = false
        }

        wave {
            enabled = true
            strategy = 'container'
        }

        fusion {
            enabled = true
        }

        aws {
            region = 'eu-west-2'
            batch {
                volumes = '/scratch/fusion:/tmp'
            }
        }
    }
}

timeline {
    enabled = true
    file = "results/pipeline_info/execution_timeline.html"
}

report {
    enabled = true
    file = "results_/pipeline_info/execution_report.html"
}

trace {
    enabled = true
    file = "results/pipeline_info/pipeline_trace.txt"
}

dag {
    enabled = true
    overwrite = true
    direction = 'TB'
    verbose = true
    file = "results/pipeline_info/pipeline_dag.html"
}

I’m am not adding any Pipeline parameters or Nextflow config file other than the one already in the repo. It’s a test and the input parameters are defined in the aws profile and I’m using a S3 path for the file input.

When I launch my pipeline it fails with a cryptic error:

The workflow execution failed to start. Exit status: 1

Essential container in task exited

Nextflow 24.10.5 is available - Please consider updating your version to it
N E X T F L O W  ~  version 24.10.4
Pulling phenopolis/segmentation-nf ...
Project config file is malformed -- Cause: Ambiguous method overloading for method java.io.File#<init>.
Cannot resolve which method to invoke for [null] due to overlapping prototypes between:
	[class java.lang.String]
	[class java.net.URI]

Then I edited my pipeline to add Revision numbermain

(The fact that I am able to select aws for profile and main for revision reassures me that the Seqera platform is accessing my private repo through my credentials).

Then I run it again and got a different error:

The workflow execution failed to start. Exit status: 1

Essential container in task exited

Nextflow 24.10.5 is available - Please consider updating your version to it
N E X T F L O W  ~  version 24.10.4
Pulling phenopolis/segmentation-nf ...
Remote resource not found: https://api.github.com/repos/phenopolis/segmentation-nf/contents/main.nf?ref=main

Which seems a bug for me, because I clearly specified segmentation.nf instead of main.nf.

Besides:

curl -H "Authorization: token my_github_token" \
     -H "Accept: application/vnd.github.v3.raw" \
     https://api.github.com/repos/phenopolis/segmentation-nf/contents/segmentation.nf?ref=main

does work as expect.

So, if anyone can help me to debug my issues, it’s very much appreciated.

First thing - you don’t need the aws profile, Seqera Platform will set all that for you. In fact, adding it might cause issues as you create conflicting config items. If you need to add additional configuration when launching via Seqera Platform, I would recommend using the Nextflow configuration staging option.

Like you say, it sounds like your Github credentials are configured correctly. Seqera Platform will pass these to the nextflow process for pulling the repo but no further.

The first error says you are trying to call the File method within the configuration and it’s not sure if it’s a string or URL, e.g. https://.... Are you using File("https://github.com/...) for something in the config? The File method wouldn’t inherit the authentication from Nextflow so might cause the second error.

If you’re not using a File, I would remove the profile for AWS, manually configure the parameters and see if it works. Try to simplify your config as much as possible and to see where the error is introduced:

docker {
    runOptions = '--gpus all -u $(id -u):$(id -g)'
}

params {
    input = null
    pattern = ''
    segs = 'fluid,aug'
}

process {
    container = 'ghcr.io/phenopolis/image-analysis:latest'
    withName: RUN_SEG {
        maxForks = 4
    }
}

dag {
    overwrite = true
    direction = 'TB'
    verbose = true
}

Finally, it might require a bit more info, can you share the full config files or logs at all?

Note that there needs to be a main.nf in the repository, even if it is an empty file. I think I’ve come across this before. Simply doing touch main.nf may solve that issue.

1 Like

First of all, many thanks!

I indeed had that in my original nextflow.config:

...
env {
    trace_timestamp = new java.util.Date().format('yyyy-MM-dd_HH-mm-ss')
    input_name = new File(params.input).name
}

timeline {
    enabled = true
    file = "results_${env.input_name}/pipeline_info/execution_timeline_${env.trace_timestamp}.html"
}
...

And among my tries I did remove this, as I suspect of it as @Adam_Talbot precisely noticed.

However, only after @ewels pointed for the main.nf issue that I was able to see things moving.

I can now run the pipeline though it still fails (like when submitting from my own computer to the AWS), but this is for another ticket.

Great to hear @alanwilter!

A couple of points, I don’t know your full code, but if you want to use the input filename as an environment variable, it’s probably better to either parse it within a process rather than in configuration:

process MY_PROCESS {
    input:
        path input_file
    output:
        stdout

    script:
    def input_file_name = input_file.name
    """
    echo "$input_file_name"
    """
}
process MY_PROCESS {
    input:
        val input_file_name
    output:
        stdout

    script:
    """
    echo "$input_file_name"
    """
}

workflow {
    input_file_names = Channel.of(file(params.input).name)
    MY_PROCESS(input_file_names)
}

Or how about this to use the channels and operators nicely:

workflow {
    input_files      = Channel.fromPath(params.input)
    input_file_names = input_files.map { inputFile -> inputFile.name }
    
    MY_PROCESS(input_file_names)
}

Also, I would use file instead of File within a workflow, because this uses the nextflow version of file which supports all the cloud storage stuff.

Thanks @Adam_Talbot, here’s more about my code and I believe I’ve used mostly like you recommended.

The thing is, in my original nextflow.config, I was trying to dynamically assign a pattern name for the results.

I can’t figure out how to do it without env. But I haven’t considered file instead of new File, I’m gonna give a try:

env {
    trace_timestamp = new java.util.Date().format('yyyy-MM-dd_HH-mm-ss')
    input_name = file(params.input).name
}

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.