1
0
Entering edit mode
Qingwei • 0
@c0b77451
Last seen 20 days ago
Japan

Hi I have a bulk pair-end 150bp RNA-seq data from seawater samples (Microorgamsim). I am trying to use DupRadar to assess PCR bias. I know there are two files like marked_bam_file and GTF are necessary for runing DupRadar. I used Trinity to assemle RNA-seq into contigs, and mapped RNA-seq into assembled contigs to get bam file. And then using picard to marked sorted bam file. About GTF, I just very confuse about this file, there is no available GTF files in current database. So, I just generate a fake GTF file, using SAM_to_gtf.pl attached in PASA tools. And got the GTF file like this:

TRINITY_DN18951_c0_g1_i2        genome  cDNA_match      319     468     100.0   +       .       ID=A00609:344:HK33KDSX2:4:1101:1217:1031.p1;Target=A00609:344:HK33KDSX2:4:1101:1217:1031 1 150

TRINITY_DN18951_c0_g1_i2        genome  cDNA_match      427     576     100.0   -       .       ID=A00609:344:HK33KDSX2:4:1101:1217:1031.p2;Target=A00609:344:HK33KDSX2:4:1101:1217:1031 1 150

TRINITY_DN10413_c0_g1_i1        genome  cDNA_match      1161    1310    100.0   +       .       ID=A00609:344:HK33KDSX2:4:1101:2374:1031.p1;Target=A00609:344:HK33KDSX2:4:1101:2374:1031 1 150

TRINITY_DN10413_c0_g1_i1        genome  cDNA_match      1220    1369    100.0   -       .       ID=A00609:344:HK33KDSX2:4:1101:2374:1031.p2;Target=A00609:344:HK33KDSX2:4:1101:2374:1031 1 150

TRINITY_DN10413_c0_g1_i2        genome  cDNA_match      1161    1310    100.0   +       .       ID=A00609:344:HK33KDSX2:4:1101:2374:1031.p3;Target=A00609:344:HK33KDSX2:4:1101:2374:1031 1 150

TRINITY_DN10413_c0_g1_i2        genome  cDNA_match      1220    1369    100.0   -       .       ID=A00609:344:HK33KDSX2:4:1101:2374:1031.p4;Target=A00609:344:HK33KDSX2:4:1101:2374:1031 1 150
.
.
.


I am not sure this is ok ay for DupRadar. I just try the command like this:

bamDuprm <- "marked_duplicates.bam"
gtf <- "m153s.gtf"
stranded <- 0
paired   <- TRUE
# Duplication rate analysis


In the beginning, the command runs well, but around 5 min, the command halted, without any error message, just show: execution halted. I have no ideas about this error. Is it the GTF file unsuitable? Thank you in advance for your help.

0
Entering edit mode
@sergisayolspuig-8816
Last seen 21 days ago
Germany

Hi Qingwei,

dupRadar is using Rsubread::featureCounts() under the hood to count reads on features of the GTF. By default, Rsubread::featureCounts() will use the coordinates of the exons of your GTF (third field) and group exons by gene_id (ninth field). For more information regarding the GTF supported format, see here.

I'm not familiar with the SAM_to_gtf.pl script included in PASA tools, but I understand your exon entries in the GTF are called cDNA_match and the gene_id is called ID. Therefore, you should call dupRadar::analyzeDuprates() with some extra parameters to tell featureCounts() how these fields are named in your GTF.

bamDuprm <- "marked_duplicates.bam"
gtf <- "m153s.gtf"
stranded <- 0
paired   <- TRUE
# Duplication rate analysis


This should potentially mitigate the issue. Otherwise, it'd be great if you could provide some further information like: the number of features in the GTF file, number of reads in the bam files, are the bam files sorted, etc.

cheers, Sergi

0
Entering edit mode

Dear Sergi, Yeah, it works.

dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads, GTF.featureType="cDNA_match", GTF.attrType="ID")


Thank you so much. And thank you for your great tools. Qingwei