Question

DESEQ2 Normalisation Factor

0

Entering edit mode

MAHESH • 0

@MAHESH-24550

Last seen 7 months ago

India

Hi, This is related to the DESEQ2 Normalization. First I would describe my experiment design, I have two groups, GroupA and GroupB. Group A has 39 replicates (Biological), and Group B Has 77 replicates. The samples are human blood samples, collected and processed randomly for both groups.

RNA-Sequencing was carried out in the 3 batches, However, I have made sure that the samples from both groups should be included in each batch, therefore, I can use the deseq2 design during analysis.

I have run DESEQ2 and run differential expression analysis. I have found around 6700 genes are differentially expressed (padj < 0.05). Around 80 % of genes are downregulated. That was surprising to me. To explore possible reasons I started looking for the normalization factor calculated by the DESEQ2 and when I made a box-plot of the calculated normalization factors for both groups, the normalization factor for Group B (treatment) was significantly different than Group A. I have run the t.test and Wilcox.test and the pValue is 0.00025. I am attaching an image of the box plot for purpose of the illustrations enter image description here .

Please can you suggest that my analysis is alright or something is wrong?

Thanking you.

dds <- DESeqDataSetFromMatrix(countData = rsem.gene.count.134,colData = design.m,design = ~Batch+class)

dds <- estimateSizeFactors(dds)

factor <- data.frame(factor = dds$sizeFactor, class = dds$class)

library(ggplot2)

p <- ggplot(factor,aes(x = class, y = factor, color = class)) + geom_boxplot() 

p + stat_compare_means(method = "wilcox.test")


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )

DESeq2 • 878 views

ADD COMMENT • link 2.2 years ago MAHESH • 0

score 1 · Answer 1 · 2022-09-10

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 6 days ago

United States

Is you sequencing depth confounded with the condition? That would be an issue in the experimental design, not in the analysis. E.g. colSums(counts(dds)) over condition.

If so, I would recommend to do higher count filtering, e.g. require a minimal count (10) in the majority of samples.

keep <- rowSums(counts(dds) >= 10) >= X
dds <- dds[keep,] # before DESeq()

ADD COMMENT • link 2.2 years ago Michael Love 43k

0

Entering edit mode

Thank you for suggesting possible confounding factors, I have checked for the sample's sequencing depth and yes there is one batch where sequencing depth is higher than another batch, say ( 50 million versus 80 million), and in this batch number of the samples of GroupB are higher than Group A (3:1). I have tried independent filtering but getting the same result. However, I am including the batch as the covariate in the model matrix, so It should not be a factor of worry I think. Thanks for clearing the suggestion.

ADD REPLY • link 2.2 years ago MAHESH • 0