Avoiding redundant calculations

Hi,

I’m building a pipeline with AlphaFold.
In essence I want to avoid early calculated structures, by doing a lookup in a certain folder.

Step I would do:

  • get sequence
  • calculate checksum on sequence, and use that one as a name for the seq.
  • calculate 3D structure using Alphafold.
  • copy structure to a pdb folder. checksum.pdb as a name.

Next time I run a pipeline for Alphafold, I would first like that the component checks the pdb folder and sees whether a pdb structure for that checksum is already available. If it is, just copy it, otherwise start calculating.

Does this makes sense?

I have never used AlphaFold so it’s hard for me to guess, but Nextflow will automatically use the cache for tasks that it has run before. If each calculation is done as a task (an instance of a Nextflow process), it won’t recalculate. If you’re doing something different, then the cache won’t kick in. You can read more about the cache here.