I'm working on a whole-tissue (brain) RNASeq study in mice where there is substantial neuronal death over time. We have multiple ages. I'd like to normalize to brain weight and would appreciate feedback to make sure I'm not doing anything that would violate DESeq2's internal modeling.
I perform the following steps:
- Collect Conditional quantile normalization (CQN) for GC content and transcript length
- Extract the offsets
- Divide the exp(offsets) by brain weight (in grams)
- Divide by the geometric mean
- Is there anything about this approach that will disturb DESeq2's internal modeling?
- Do you have any other suggestions?
Here is my code:
# Read in the saved length and GC content mmu.len.gc <- read.delim("mus_musculus_length_and_gc_content.txt", header=TRUE) mmu.len.gc <- mmu.len.gc[!is.na(mmu.len.gc$length) & !is.na(mmu.len.gc$gc),] pre.dds <- estimateSizeFactors(pre.dds) common_transcripts <- intersect(rownames(counts.all), rownames(mmu.len.gc)) counts.common <- counts.all[common_transcripts,] # Perform conditional quantile normalization for GC and length. This will also # account for library size. cqn.obj <- cqn(counts=counts.common, x=mmu.len.gc[common_transcripts,]$gc, lengths=mmu.len.gc[common_transcripts,]$length, sizeFactors = sizeFactors(pre.dds)) # Extract offsets cqnOffset.bw <- cqn.obj$glm.offset # Normalize to brain weight (converting from milligrams to grams) cqnNormFactors.bw <- exp(cqnOffset.bw) / (sample.sheet$brain.weight/1000) # Divide by geometric normFactors.bw <- cqnNormFactors.bw / exp(rowMeans(log(cqnNormFactors.bw))) dds <- DESeqDataSetFromMatrix(countData = counts.common, colData = colData, design = ~age + sex + genotype) normalizationFactors(dds) <- normFactors.bw
Really appreciate your help.