BAMBU extended gtf file?
Seongwoo Han
Hello there, I want to know what "extended_annotations.gtf is here (," one of BAMBU's main outputs. It sounds like extended_annotations.gtf is a file with the entire reference annotation plus all discovered novel transcripts. This is a size of about 200 Mb. What I am trying to get is something like "transcript_models.gtf" that has just constructed transcripts (both known and novel), so no entire reference annotation. To my knowledge, its size is 90 ~ 100 Mb. Is there a way to gain that filtered gtf through the command line?

I am using cDNA ONT and cDNA PacBio datasets. I am providing the command line I used to convert from .fastq file to bam file below for cDNA ONT in case I missed something.

./minimap2 -t 8 -ax splice /home/seong/R/x86_64-pc-linux-gnu-library/4.1/bambu/extdata/hg38.fa /data/long_read/ENCBS944CBA/ENCFF263YFG.fastq -o /data/long_read/ENCBS944CBA/ENCFF263YFG.sam

samtools view -@ 8 -Sb -o /data/long_read/ENCBS944CBA/ENCFF563QZR.bam /data/long_read/ENCBS944CBA/ENCFF563QZR.sam

One another question that I have is, does BAMBU detect intron retention? Let me know for these questions, thanks a lot!

hello Seongwoo, Did u able to run bambu successfully? I am facing technical problems so it would be great to get help.

Andre
Hi Seongwoo,

I think I addressed this on the Github Issue, but for the sake of users that might find this issue here.

This line filters the output and you can then write the output as usual.

constructedAnnotations = se[assays(se)$fullLengthCounts > 0]
writeBambuOutput(constructedAnnotations, path = "./YOUR_PATH_HERE/")

