I need to run Sarek on a large number of genomes (more than 10,000). Our HPC is set up with apptainer and works pretty well. I think I can go two ways:
- Run Sarek as I usually do, and let it handle all the particularities of the process managements in terms of memory, etc.
- Create a bash script with a certain amount of memory and resources, and run Sarek in there without specifying a profile.
I am just concerned that (1) might overwhelm our cluster, not sure. I think (2) might be more friendly for sysadmins. I was also thinking to sleep the loop that starts the processes every 100 genomes for some time to give the system a chance to catch up. Do you have any suggestions when handling these kinds of situations? It is about how to run things without creating problems for the HPC and/or the sysadmins.