DESeq2 - transcript length correction necessary when quantifying reads from ends of transcripts?
1
0
Entering edit mode
@84e705e7
Last seen 21 months ago
United Kingdom

Hi all,

I'm dealing with an unusual case of DGE, where I'm interested in quantifying reads that come from particular regions at transcript ends instead of the whole transcript. I'm screening standard Illumina RNA-Seq reads for these regions and quantify these separately from all other reads (background reads). I'm then doing a gene-based DGE analysis, but am wondering whether this analysis might be affected by differences in isoform usage between treatment groups.

What if one treatment group uses longer isoforms than the other group? In standard DGE, transcript length matters because longer transcripts produce more reads, but I can't quite get my head around whether transcript length affects the number of reads I get at transcript ends. My intuition is that the number of reads at a transcript end depends only on the length of the nucleotide sequence I'm screening for (say, 30bp); if one treatment group now switches to longer isoforms I will see more background reads, but the same number of reads at the first 30bp. This means I don’t need length correction for my counts. Is this correct or should I be doing transcript-level expression analysis accounting for transcript length?

Thank you very much, Marius

Transcriptomics DESeq2 • 961 views
ADD COMMENT
0
Entering edit mode
ATpoint ★ 4.0k
@atpoint-13662
Last seen 9 hours ago
Germany

You need no correction for end-tagged data, see the section in tximport:

https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#3%E2%80%99_tagged_RNA-seq

If you have 3’ tagged RNA-seq data, then correcting the counts for gene length will induce a bias in your analysis, because the counts do not have length bias. Instead of using the default full-transcript-length pipeline, we recommend to use the original counts, e.g. txi$counts as a counts matrix, e.g. providing to DESeqDataSetFromMatrix or to the edgeR or limma functions without calculating an offset and without using countsFromAbundance.

ADD COMMENT
0
Entering edit mode

Yes, if you are just counting in a fixed length window, you don't need length correction for those counts.

You may want to do some QC to make sure that you have good 3' coverage, which can be generated with e.g. RNA-SeQC.

ADD REPLY
0
Entering edit mode

Thank you, Michael and ATpoint. I can see the parallels between 3'-tagged data and what I'm doing, but wasn't sure if they are equivalent. It's great news that my counts are not affected by changes in isoform length. Thank you for your input!

ADD REPLY

Login before adding your answer.

Traffic: 750 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6