Fail to run RupRadar
1
0
Entering edit mode
Qingwei • 0
@c0b77451
Last seen 21 months ago
Japan

Hi I have a bulk pair-end 150bp RNA-seq data from seawater samples (Microorgamsim). I am trying to use DupRadar to assess PCR bias. I know there are two files like marked_bam_file and GTF are necessary for runing DupRadar. I used Trinity to assemle RNA-seq into contigs, and mapped RNA-seq into assembled contigs to get bam file. And then using picard to marked sorted bam file. About GTF, I just very confuse about this file, there is no available GTF files in current database. So, I just generate a fake GTF file, using SAM_to_gtf.pl attached in PASA tools. And got the GTF file like this:

TRINITY_DN18951_c0_g1_i2        genome  cDNA_match      319     468     100.0   +       .       ID=A00609:344:HK33KDSX2:4:1101:1217:1031.p1;Target=A00609:344:HK33KDSX2:4:1101:1217:1031 1 150

TRINITY_DN18951_c0_g1_i2        genome  cDNA_match      427     576     100.0   -       .       ID=A00609:344:HK33KDSX2:4:1101:1217:1031.p2;Target=A00609:344:HK33KDSX2:4:1101:1217:1031 1 150

TRINITY_DN10413_c0_g1_i1        genome  cDNA_match      1161    1310    100.0   +       .       ID=A00609:344:HK33KDSX2:4:1101:2374:1031.p1;Target=A00609:344:HK33KDSX2:4:1101:2374:1031 1 150

TRINITY_DN10413_c0_g1_i1        genome  cDNA_match      1220    1369    100.0   -       .       ID=A00609:344:HK33KDSX2:4:1101:2374:1031.p2;Target=A00609:344:HK33KDSX2:4:1101:2374:1031 1 150

TRINITY_DN10413_c0_g1_i2        genome  cDNA_match      1161    1310    100.0   +       .       ID=A00609:344:HK33KDSX2:4:1101:2374:1031.p3;Target=A00609:344:HK33KDSX2:4:1101:2374:1031 1 150

TRINITY_DN10413_c0_g1_i2        genome  cDNA_match      1220    1369    100.0   -       .       ID=A00609:344:HK33KDSX2:4:1101:2374:1031.p4;Target=A00609:344:HK33KDSX2:4:1101:2374:1031 1 150
.
.
.

I am not sure this is ok ay for DupRadar. I just try the command like this:

bamDuprm <- "marked_duplicates.bam"
gtf <- "m153s.gtf"
stranded <- 0
paired   <- TRUE
threads  <- 4 
# Duplication rate analysis
dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads)

In the beginning, the command runs well, but around 5 min, the command halted, without any error message, just show: execution halted. I have no ideas about this error. Is it the GTF file unsuitable? Thank you in advance for your help.

dupRadar • 700 views
ADD COMMENT
0
Entering edit mode
@sergisayolspuig-8816
Last seen 21 months ago
Germany

Hi Qingwei,

dupRadar is using Rsubread::featureCounts() under the hood to count reads on features of the GTF. By default, Rsubread::featureCounts() will use the coordinates of the exons of your GTF (third field) and group exons by gene_id (ninth field). For more information regarding the GTF supported format, see here.

I'm not familiar with the SAM_to_gtf.pl script included in PASA tools, but I understand your exon entries in the GTF are called cDNA_match and the gene_id is called ID. Therefore, you should call dupRadar::analyzeDuprates() with some extra parameters to tell featureCounts() how these fields are named in your GTF.

bamDuprm <- "marked_duplicates.bam"
gtf <- "m153s.gtf"
stranded <- 0
paired   <- TRUE
threads  <- 4 
# Duplication rate analysis
dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads, GTF.featureType="cDNA_match", GTF.attrType="ID")

This should potentially mitigate the issue. Otherwise, it'd be great if you could provide some further information like: the number of features in the GTF file, number of reads in the bam files, are the bam files sorted, etc.

cheers, Sergi

ADD COMMENT
0
Entering edit mode

Dear Sergi, Yeah, it works.

dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads, GTF.featureType="cDNA_match", GTF.attrType="ID")

Thank you so much. And thank you for your great tools. Qingwei

ADD REPLY

Login before adding your answer.

Traffic: 711 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6