Pipeline PR issues with unstable llms-full.txt and multiqc.parquet

My PR #42 (Add variant calls and methylation features, update modules, compliance with `nf-core/tools` template 3.5.1 by chaochaowong · Pull Request #42 · nf-core/pacvar · GitHub) for the pacvar pipeline has error with nonmatching snapshots due to MultiQC AI summary files, llms-full.txt and multipqc.parquet. Below is the git CI nf-test errors. My question is what’s the best practice to deal with these non-deterministic files?

@@ -30,8 @@                             @@ +30,8 @@                            
  30         "trgt/sample1_C9ORF72_moti   30         "trgt/sample1_C9ORF72_moti
fs.png"                                 fs.png"                                
  31     ],                               31     ],                            
  32     [                                32     [                             
! 33         "llms-full.txt:md5,5129ab3 ! 33         "llms-full.txt:md5,6b82fdd
f35c5a087e770d872837b40c8",             1ecc2c13fe91ac3644561fe2b",            
! 34         "multiqc.parquet:md5,1f4fe ! 34         "multiqc.parquet:md5,8672f
164d94ba51529a6f456bdd7f668",           2fc6f8d9a34429d1d221844d366",          
  35         "multiqc_citations.txt:md5   35         "multiqc_citations.txt:md5
,4c806e63a283ec1b7e78cdae3a923d4f"      ,4c806e63a283ec1b7e78cdae3a923d4f"     
  36     ]                                36     ]                             
  37 ]                                    37 ]                                 

    FAILED (68.787s)


Seqera AI suggests (below) that I can edit test/nextflow.config to disable the AI features for CI testing. Does it make sense? I don’t see any other pipelines does this.

process {
    withName: 'NFCORE_PACVAR:PACVAR:MULTIQC' {
        ext.args = '--no-ai'  // Completely disable AI features
    }
}

For the end-to-end pipeline nf-test, having these two files in the .nftignore file will just check for their existence, and not their md5sum (as seen in the runner with the error). So you’ll probably have to just update the snapshot. Now since you pasted md5sums above, maybe they come from a version where .nftignore is not udpated.
The other failing docker test is irrelevant, and for test: “Provide fail bam via samplesheet (optional) and merge in case of repeat workflow”, where a file changes md5sum and you’ll have to account for that.
Let me know if this helps!

2 Likes

That works—thank you. I was initially puzzled because Seqera AI indicated that --no-ai would disable MultiQC from generating llms-full.txt and multiqc.parquet. However, after testing, I found that MultiQC still produces these two files.

Thanks for the helpful suggestions. I’ve added llms-full.txt, multiqc.parquet, and sample1.merged.bam to .nftignore, and updated all the snapshots. pacvar PR #41 passed all the tests!!!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.