Is SplitFastq deterministic?

This partly relates to github ratio #4798, which was closed by the following comment:

Closing since we don’t plan to extend these operators any further. The best practice in this case is to use the flatMap operator and call the splitFastq function within the operator on each file. This approach is much more flexible

Which on its own isn’t an issue, but is splitFastq deterministic in how it split files? I assume it does (otherwise Ben won’t give such a suggestion), but nothing on the docs mention this. Obviously, like every bioinformatician, I’m not at all want to see a paired set of FASTQs being jumbled up by a non-deterministic split. Will the devs give some assurance regarding this?

I think the splitFastq operator has some extra magic for handling paired-end reads. With the splitFastq function you have to be more explicit, but you can do it easily without jumbling the chunks. Calling splitFastq separately on each file will give you two separate lists which you can then combine as needed.

For example:

ch_reads.flatMap { id, fastq_1, fastq_2 ->
  def chunks_1 = fastq_1.splitFastq()
  def chunks_2 = fastq_2.splitFastq()

  // assume chunks_1.size() == chunks_2.size()
  def n = chunks_1.size()

  (0..n).collect { i ->
    tuple(id, chunks_1[i], chunks_2[i])
  }
}