Question

Normalizing for both gene size and library composition

0

Entering edit mode

urjaswita ▴ 40

@urjaswita-13128

Last seen 5.7 years ago

Hi,

I want to compare expression level of genes with the same samples and across species (different gene lengths). So I needed a way to normalize both within sample (similar to TPM) and between samples (similar to DESeq2/EDGER). Ideally I would like to be able to do stats on significance for both inter and intra sample differences. I came across this thread: https://support.bioconductor.org/p/108442/ which looks like exactly what I want but I was wondering:

If I do what's recommended in the thread above,

assays(dds)[["avgTxLength"]] <- length.mat dds <- DESeq(dds)

can I then compare genes within the same sample like I could do with Transcripts Per Million? E.g. high count of a gene will mean high expression compared to other genes in the same sample.

I know transcript abundance tools can sort of do both length and library abundance normalization, but I do not want any multi-mapping reads in my analysis. Is there any way to align to a transcriptome but compute normalized counts based only on uniquely mapped reads?

Thank you!

RNA-seq normalization deseq2 • 1.2k views

ADD COMMENT • link updated 5.8 years ago by Michael Love 43k • written 5.8 years ago by urjaswita ▴ 40

score 0 · Answer 1 · 2020-01-27

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 days ago

United States

(1) yes, the average transcript length mechanism allows you to compare counts, while normalizing for differences in length which are considered nuisance. Whether it's from DTU or any source, it is controlled for as a GLM offset with this procedure.

(2) Transcriptome alignment is not an efficient way to get genome unique reads. I don't know how you would do that practically either.

ADD COMMENT • link 5.8 years ago Michael Love 43k

0

Entering edit mode

Thanks Michael. I am a bit confused between within sample and cross sample length normalization. The question with the link above concerns cross sample length normalization because gene length would be different in different species. Could you confirm that it will also normalize for with the same sample using this approach? And could you point me to the source of how it's done? Thank you again!

ADD REPLY • link 5.7 years ago urjaswita ▴ 40

0

Entering edit mode

Within-sample normalization is performed by DESeq when estimateSizeFactors is called. You will see the message "estimating size factors" and that is when the within-sample normalization (better, estimation of the parameters) takes place. It is done by performing standard size factor estimation on the matrix obtained after dividing out the pre-computed normalization factors. This all happens behind the scenes when DESeq is run.

ADD REPLY • link 5.7 years ago Michael Love 43k