Hi,
I want to compare expression level of genes with the same samples and across species (different gene lengths). So I needed a way to normalize both within sample (similar to TPM) and between samples (similar to DESeq2/EDGER). Ideally I would like to be able to do stats on significance for both inter and intra sample differences. I came across this thread: https://support.bioconductor.org/p/108442/ which looks like exactly what I want but I was wondering:
- If I do what's recommended in the thread above,
assays(dds)[["avgTxLength"]] <- length.mat dds <- DESeq(dds)
can I then compare genes within the same sample like I could do with Transcripts Per Million? E.g. high count of a gene will mean high expression compared to other genes in the same sample.
- I know transcript abundance tools can sort of do both length and library abundance normalization, but I do not want any multi-mapping reads in my analysis. Is there any way to align to a transcriptome but compute normalized counts based only on uniquely mapped reads?
Thank you!
Thanks Michael. I am a bit confused between within sample and cross sample length normalization. The question with the link above concerns cross sample length normalization because gene length would be different in different species. Could you confirm that it will also normalize for with the same sample using this approach? And could you point me to the source of how it's done? Thank you again!
Within-sample normalization is performed by
DESeq
whenestimateSizeFactors
is called. You will see the message"estimating size factors"
and that is when the within-sample normalization (better, estimation of the parameters) takes place. It is done by performing standard size factor estimation on the matrix obtained after dividing out the pre-computed normalization factors. This all happens behind the scenes whenDESeq
is run.