Question

DESeq2 shows comparison nonsignificant, unclear why

0

Entering edit mode

daniella • 0

@daniella-24096

Last seen 12 months ago

United States

Hi,

I am performing differential expression on gene expression between different time points- time point 0 vs. time points 0.5, 1 and 3.

For a specific gene, I have the following expression profile (this is normalized reads): Normalized reads +- SE

It seems in the plot like there is a significant difference between time points 0 and 1. Actually, this is a gene which I know is induced in this paradigm from many past experiments. However, DESeq does not find the comparison significant:

              baseMean log2FoldChange    lfcSE     stat    pvalue      padj
 GeneX 1230.466       1.277551 1.164463 1.097116 0.2725906 0.8389563

The individual values of time point 0 are:

tp0.1 tp0.2 tp0.3

1622.129 1260.146 1831.165

and for 1:

tp1.1 tp1.2 tp1.3

2878.453 4137.590 4406.023

If I perform a simple t-test on the two time points, I get p-value = 0.0304.

My question is- what might be the reason that this comparison is coming up as insignificant in the DESeq analysis? Thanks a lot!

DESeq2 RNASeq DifferentialExpression • 1.2k views

ADD COMMENT • link updated 19 months ago by Michael Love 41k • written 2.7 years ago by daniella • 0

score 0 · Answer 1 · 2021-08-23

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

Provide code, including your colData and design.

ADD COMMENT • link 2.7 years ago Michael Love 41k

0

Entering edit mode

Hey Michael, thanks so much for your reply. I've uploaded the raw reads and colData to:

https://drive.google.com/file/d/1ryg73rZbsvVjIwp4drhF2iONVTHWDeAY/view?usp=sharing

https://drive.google.com/file/d/1JsnOpEP2D7JkTBsN7kcAKUkyq-q1Hth_/view?usp=sharing

The code is:

dds.dd = DESeqDataSetFromMatrix(countData = raw.reads.dd,
                                   colData = samp.data.dd,
                                   design = ~ group) 
rld.dd = rlogTransformation(dds.dd)


design(dds.dd)  # the design is according to the group, which is a combination of the cell type and time point
dds.dd = DESeq(dds.dd)


#------- Compare Cell Type B time point1  vs. 0 timepoint, look at result of GeneX

res.1.dd = results(dds.dd, contrast=c("group","CellTypeB.1","CellTypeB.0"))
res.1.dd.df = as.data.frame(res.1.dd)
res.1.dd.df[rownames(res.1.dd.df)=="GeneX",]  #turns out nonsignificant

ADD REPLY • link 2.6 years ago daniella • 0

0

Entering edit mode

Two recommendations, the two cell types are very different, for a gene like GeneX, it's not even expressed in the other cell type. So one is to split these within cell type comparisons by making two dds objects. The second is to remove thousands of unexpressed genes once you split into the two datasets (either way I would recommend to do this for speed).

You're better off modeling like this:

> dds2 <- dds[,grep("CellTypeB", dds$group)]
> dds2$group <- droplevels(dds2$group)
> keep <- rowSums(counts(dds2) >= 10) >= 3
> table(keep)
keep
FALSE  TRUE
39828 15507
> dds2 <- dds2[keep,]
> dds2 <- DESeq(dds2)
> res <- results(dds2, contrast=c("group","CellTypeB.1","CellTypeB.0"))
> res["GeneX",]
log2 fold change (MLE): group CellTypeB.1 vs CellTypeB.0
Wald test p-value: group CellTypeB.1 vs CellTypeB.0
DataFrame with 1 row and 6 columns
       baseMean log2FoldChange     lfcSE      stat      pvalue        padj
      <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
GeneX   2439.44        1.27855   0.23307   5.48566 4.11919e-08 7.33678e-05

ADD REPLY • link 2.6 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

A follow up to the above question: I've performed the comparison for each cell type in a different dds, as you suggested. Next, I wish to plot the timecourse of expression for both cell types on the same graph. However, now the normalized counts are no longer from a normalization of all samples together (rather, each dds of each cell types was normalized on its own). What is the correct way to compare and plot the data in such a case?

Thanks again!

ADD REPLY • link 19 months ago daniella • 0

0

Entering edit mode

I'd recommend to compute vst of the entire dataset and use those values for plotting. These are ~ log2 counts for large counts but stabilized as the counts approach 0.

ADD REPLY • link 19 months ago Michael Love 41k