.bai files not found in follow-up process

sam_bam script:

#!/usr/bin/env nextflow
process bbRubble {
    debug true

    tag "${sample_id}_bam"

    container 'https://depot.galaxyproject.org/singularity/samtools:1.21--h96c455f_1'

    publishDir "results/alignment/${project_num}/bam", mode: 'copy'

    input:
    val(sample_id)
    val(project_num)

    path(aligned_sample_read1)

    output:
    val(sample_id), emit: sample_id

    tuple path("${sample_id}.sorted.bam"), path("${sample_id}.sorted.bam.bai"), emit: sorted_bam_pair

    script:
    """
    samtools view -b -S "${aligned_sample_read1}" | samtools sort -o "${sample_id}.sorted.bam"

    samtools index "${sample_id}.sorted.bam" -o "${sample_id}.sorted.bam.bai"
    """

}

umi_dedup:

#!/usr/bin/env nextflow
process umiGeddon {
    debug true

    tag "${sample_id}_umiC"

    conda 'umi_tools'

    publishDir "results/umi/${project_num}/dedup"

    input:
    val(sample_id)
    val(project_num)

    tuple path(bam), path(bai)

    output:
    val(sample_id), emit: sample_id

    path('*.bam'),  emit: dedup_bam
    path('*.tsv'),  emit: dedup_stats

    script:
    """
    umi_tools dedup -I "${bam}" --output-stats="${sample_id}_dedup" -S "${sample_id}_dedup.bam"
    """
}

main:

//convert alignment files into bam
    bbRubble_out = bbRubble(chiro_mirna.out.sample_id, project_ch, chiro_mirna.out.aligned_sample_read1)

    //collect bam files
    bam_input_ch = bbRubble_out.sorted_bam_pair

    //deduplicate aligned reads based on UMI
    umiGeddon(bbRubble.out.sample_id, project_ch, bam_input_ch)

Issue: when i use the umitool with bam its supposed to find the appropriate bai, but for some reason it doesnt find it. Ive tried various ways to “capture” output but have failed. Any help would be appreciated.

script Command executed:

  echo "BAM: 124988-8.sorted.bam"
  echo "BAI: 124988-8.sorted.bam.bai"
  ls -lh .
  file "124988-8.sorted.bam"
  file "124988-8.sorted.bam.bai"
  
  umi_tools dedup -I "124988-8.sorted.bam" --output-stats="124988-8_dedup" -S "124988-8_dedup.bam"

 ValueError: fetch called on bamfile without index

Work dir:
  /home/biouser/sRNA/work/e2/4f41918cc1bdee19a0b78eaad7a095

So I think I found the issue, since im establishing a pipeline the -resume feature is often used. I decided to clean nextflow -f a couple of times. I reran the process and it worked. As I was looking at the cache system, Nextflow caches on name, size, last updated timestamp. Im sure since there was several resumes for this paricular step as i was editing my pipeline some piece of this information must have been broken so when umi_tools was looking for the index it was not able to find it.Clearing it to the last process that was successfully run helped. Looking at cache and its options could probably help as ‘leniant’ seems particularly useful for working on conflicting systems.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.