DESEQ2 Normalisation Factor
1
0
Entering edit mode
MAHESH • 0
@MAHESH-24550
Last seen 1 hour ago
India

Hi, This is related to the DESEQ2 Normalization. First I would describe my experiment design, I have two groups, GroupA and GroupB. Group A has 39 replicates (Biological), and Group B Has 77 replicates. The samples are human blood samples, collected and processed randomly for both groups.

RNA-Sequencing was carried out in the 3 batches, However, I have made sure that the samples from both groups should be included in each batch, therefore, I can use the deseq2 design during analysis.

I have run DESEQ2 and run differential expression analysis. I have found around 6700 genes are differentially expressed (padj < 0.05). Around 80 % of genes are downregulated. That was surprising to me. To explore possible reasons I started looking for the normalization factor calculated by the DESEQ2 and when I made a box-plot of the calculated normalization factors for both groups, the normalization factor for Group B (treatment) was significantly different than Group A. I have run the t.test and Wilcox.test and the pValue is 0.00025. I am attaching an image of the box plot for purpose of the illustrations .

Please can you suggest that my analysis is alright or something is wrong?

Thanking you.

dds <- DESeqDataSetFromMatrix(countData = rsem.gene.count.134,colData = design.m,design = ~Batch+class)

dds <- estimateSizeFactors(dds)

factor <- data.frame(factor = dds$sizeFactor, class = dds$class)

library(ggplot2)

p <- ggplot(factor,aes(x = class, y = factor, color = class)) + geom_boxplot()

p + stat_compare_means(method = "wilcox.test")

# include your problematic code here with any corresponding output
# please also include the results of running the following in an R session

sessionInfo( )

DESeq2 • 134 views
1
Entering edit mode
@mikelove
Last seen 8 hours ago
United States

Is you sequencing depth confounded with the condition? That would be an issue in the experimental design, not in the analysis. E.g. colSums(counts(dds)) over condition.

If so, I would recommend to do higher count filtering, e.g. require a minimal count (10) in the majority of samples.

keep <- rowSums(counts(dds) >= 10) >= X
dds <- dds[keep,] # before DESeq()

0
Entering edit mode

Thank you for suggesting possible confounding factors, I have checked for the sample's sequencing depth and yes there is one batch where sequencing depth is higher than another batch, say ( 50 million versus 80 million), and in this batch number of the samples of GroupB are higher than Group A (3:1). I have tried independent filtering but getting the same result. However, I am including the batch as the covariate in the model matrix, so It should not be a factor of worry I think. Thanks for clearing the suggestion.