Normalizing for both gene size and library composition
1
0
Entering edit mode
urjaswita ▴ 40
@urjaswita-13128
Last seen 4.2 years ago

Hi,

I want to compare expression level of genes with the same samples and across species (different gene lengths). So I needed a way to normalize both within sample (similar to TPM) and between samples (similar to DESeq2/EDGER). Ideally I would like to be able to do stats on significance for both inter and intra sample differences. I came across this thread: https://support.bioconductor.org/p/108442/ which looks like exactly what I want but I was wondering:

  1. If I do what's recommended in the thread above,

assays(dds)[["avgTxLength"]] <- length.mat dds <- DESeq(dds)

can I then compare genes within the same sample like I could do with Transcripts Per Million? E.g. high count of a gene will mean high expression compared to other genes in the same sample.

  1. I know transcript abundance tools can sort of do both length and library abundance normalization, but I do not want any multi-mapping reads in my analysis. Is there any way to align to a transcriptome but compute normalized counts based only on uniquely mapped reads?

Thank you!

RNA-seq normalization deseq2 • 651 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 14 hours ago
United States

(1) yes, the average transcript length mechanism allows you to compare counts, while normalizing for differences in length which are considered nuisance. Whether it's from DTU or any source, it is controlled for as a GLM offset with this procedure.

(2) Transcriptome alignment is not an efficient way to get genome unique reads. I don't know how you would do that practically either.

ADD COMMENT
0
Entering edit mode

Thanks Michael. I am a bit confused between within sample and cross sample length normalization. The question with the link above concerns cross sample length normalization because gene length would be different in different species. Could you confirm that it will also normalize for with the same sample using this approach? And could you point me to the source of how it's done? Thank you again!

ADD REPLY
0
Entering edit mode

Within-sample normalization is performed by DESeq when estimateSizeFactors is called. You will see the message "estimating size factors" and that is when the within-sample normalization (better, estimation of the parameters) takes place. It is done by performing standard size factor estimation on the matrix obtained after dividing out the pre-computed normalization factors. This all happens behind the scenes when DESeq is run.

ADD REPLY

Login before adding your answer.

Traffic: 577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6