I am following the (excellent) tutorial at Introduction - training.nextflow.io and trying to implement it using the same code hosted on my own GitHub private repository, using my own AWS resources, and being launched with Seqera Platform Cloud. The training works fine on the provided GitPod system and when I run it on my local computer. However, running it using AWS compute environment fails at Process 6b with the error:
.command.sh: line 8: gghist.R: command not found
So, when using a separate R script, is there a way to make it available to the EC2 compute environment?
The environment does have access to the GitHub repo which contains the script in the bin/ directory . I also tried providing the script on S3. I have not tried packaging the script into a Docker image. The tutorial made it seem that there was a way to run it without doing that. Interested to hear your suggestions.
We’ve had a chat via messages about this problem, but I want to do a quick summary of what we found for those that might be directed here in the future.
If you have a small accessory script that you’d like to use in a Nextflow process, that script needs to be:
Located in the bin directory
Made executable (chmod +x bin/myScript.R)
Checked into version control (git add bin; git commit -m "scripts"; git push)
Of course, Wayne had correctly performed all of these steps but when resuming the runs, the gghist.R script was not in the $PATH.
Wayne was using the Seqera Platform (seqera.io) to submit these runs to an AWS Batch Compute Environment. When resuming a run on the Seqera Platform, the default behaviour is to resume using the same revision as the parent run.
My understanding, Wayne, is that you resuming from a version of the workflow before the commit that included the gghist.R script, so the script was not available to the run. The solution is to run using the latest version of the workflow.
Example
Resuming a run will render a page that contains: