DESeq2 - transcript length correction necessary when quantifying reads from ends of transcripts?
Entering edit mode
Last seen 14 days ago
United Kingdom

Hi all,

I'm dealing with an unusual case of DGE, where I'm interested in quantifying reads that come from particular regions at transcript ends instead of the whole transcript. I'm screening standard Illumina RNA-Seq reads for these regions and quantify these separately from all other reads (background reads). I'm then doing a gene-based DGE analysis, but am wondering whether this analysis might be affected by differences in isoform usage between treatment groups.

What if one treatment group uses longer isoforms than the other group? In standard DGE, transcript length matters because longer transcripts produce more reads, but I can't quite get my head around whether transcript length affects the number of reads I get at transcript ends. My intuition is that the number of reads at a transcript end depends only on the length of the nucleotide sequence I'm screening for (say, 30bp); if one treatment group now switches to longer isoforms I will see more background reads, but the same number of reads at the first 30bp. This means I don’t need length correction for my counts. Is this correct or should I be doing transcript-level expression analysis accounting for transcript length?

Thank you very much, Marius

Transcriptomics DESeq2 • 125 views
Entering edit mode
ATpoint ★ 1.4k
Last seen 1 day ago

You need no correction for end-tagged data, see the section in tximport:

If you have 3’ tagged RNA-seq data, then correcting the counts for gene length will induce a bias in your analysis, because the counts do not have length bias. Instead of using the default full-transcript-length pipeline, we recommend to use the original counts, e.g. txi$counts as a counts matrix, e.g. providing to DESeqDataSetFromMatrix or to the edgeR or limma functions without calculating an offset and without using countsFromAbundance.

Entering edit mode

Yes, if you are just counting in a fixed length window, you don't need length correction for those counts.

You may want to do some QC to make sure that you have good 3' coverage, which can be generated with e.g. RNA-SeQC.

Entering edit mode

Thank you, Michael and ATpoint. I can see the parallels between 3'-tagged data and what I'm doing, but wasn't sure if they are equivalent. It's great news that my counts are not affected by changes in isoform length. Thank you for your input!


Login before adding your answer.

Traffic: 319 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6