Question

Shoud I use "aligned reads" or total reads (aligned + unassigned) to the RPKM value?

0

Entering edit mode

gustavoborin01 • 0

@gustavoborin01-6892

Last seen 9.5 years ago

Dear all,

I'm recalculating the RPKM value of a RNASeq data on Rsubread through featureCounts function, and I'd like to know if should I use just the "assigned" reads or the total reads, including "unassigned ambiguity, multimapping..." (see below), in the RPKM formula. Looking for the answer in forums and in the Mortazavi et al. (2008), I've just find out that " N is the total number of mappable reads in the experiment". So, could anybody please help in this regards?

RPKM = N/(L*T)

where:

N: number of reads assigned to a gene

L: lenght of the gene (kb)

T: total mapped reads (Millions)

T_reesei_F24.1_GGCTAC_L008_R1_001.cleanreads.fastq.gz_tophat2.F24h.1_accepted_hits.bam
Assigned	32270962
Unassigned_Ambiguity	6896
Unassigned_MultiMapping	116803
Unassigned_NoFeatures	10751746
Unassigned_Unmapped	0
Unassigned_MappingQuality	0
Unassigned_FragementLength	0
Unassigned_Chimera	0

Thanks in advance!

rnaseq R rsubread rpkm • 1.6k views

ADD COMMENT • link updated 9.5 years ago by James W. MacDonald 65k • written 9.5 years ago by gustavoborin01 • 0

score 2 · Answer 1 · 2014-10-28

2

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 4 hours ago

United States

You should use the assigned reads only. For your purposes, the library size consists of all the reads that you will be using to infer transcript abundance, not the total number of reads that you generated.

ADD COMMENT • link 9.5 years ago James W. MacDonald 65k