DESEQ2 with 3 biological replicates for control and 1 biological replicate for treatment
1
0
Entering edit mode
wd ▴ 30
@wd-7410
Last seen 15 months ago
Germany

Hi

I want to use the DESEQ package between a control (3 biological replicates) and treatment (1 biological replicate).

IN DESeq I herefore used the following code, and got 266 genes with padj < 0.05:

row.names(table) <- table$Feature_ID count_table <- table[, -1] conds <- c("ctrl", "ctrl", "ctrl", "treatment") cds <- newCountDataSet(count_table, conds) cds <- estimateSizeFactors(cds) cds <- estimateDispersions(cds, method="blind", sharingMode="fit-only") results <- nbinomTest(cds, "ctrl", "treatment") In DESeq2 I used the follwing command, but got > 10000 genes with padj < 0.05: table <- read.delim("test.txt") row.names(table) <- table$Feature_ID
count_table <- table[, -1]
colData <- DataFrame(condition=factor(c("ctrl", "ctrl", "ctrl", "treatment")))
dds <- DESeqDataSetFromMatrix(count_table, colData, formula(~ condition))
results <- DESeq(dds, minReplicatesForReplace=Inf)

So probably I need to add extra parameters to the DESEQ2 analysis but for now I can't figure out how?

Thank you for helping

Wannes

DESEQ deseq2 deseq replicates biologicalreplicates • 3.4k views
0
Entering edit mode

What is your basis for assuming that you need to add extra parameters to your DESeq2 pipeline?

0
Entering edit mode
Bernd Klaus ▴ 590
@bernd-klaus-6281
Last seen 2.7 years ago
Germany

Hi Wannes,

sounds like DESeq2 might underestimate  the dispersions in your example. Can you produce a plot of the dispersion fit via

plotDispEsts(dds)

and a histogram of the p-values via

hist(results\$pvalue, col = "tan2") ?

The DESeq is known to be a bit "conservative" but the difference is quite extreme ...

The plots should help to diagnose things ...

Best regards,

Bernd

1
Entering edit mode

It's not necessarily underestimation, but that the previous code was certainly overestimating:

"blind - Ignore the sample labels and compute a gene’s empirical dispersion value as if all samples were replicates of a single condition. This can be done even if there are no biological replicates. This method can lead to loss of power; see the vignette for details. The single estimated dispersion condition is called "blind" and used for all samples."

more comparable (and recommended by Simon I think) would have been pooled or pooled-CR for DESeq.

1
Entering edit mode

another comment: this sounds like an experiment where control samples are very similar, and the treatment sample (no replicates) is very different. You could use plotPCA to investigate the inter-sample relationships.

The result from null hypothesis testing is not very interesting for experiments like this.

0
Entering edit mode

Dear Michael and Bernd

Please find attached the links to the requested plots for the DESEQ2 analysis.

histogram: http://pasteboard.co/2AastiqQ.png

dispersion fit: http://pasteboard.co/2AazY7KX.png

I also ran the DESEQ analysis using the "pooled" method and this time also around 10000 (out of 18000) genes (similar to DESEQ2) had a padj < 0.05.

Question is which analysis is the most 'correct' one.

Wannes

0
Entering edit mode
The blind method is not appropriate. PC1 / 90% of the variance of the top genes is the difference between control and treatment. There are many, large differences between control and treatment.
0
Entering edit mode

Hi Mike,

sorry, I overlooked the "blind" option and agree: DESeq is most probably rather overestimating the dispersions and the blind option is not appropriate.

@Wannes: the DESeq2 results function has an option lfcThreshold, which is set to 0 by

default. It might be worthwile to look at the fold changes and test against a threshold higher than 0. After all, you have only one treatment sample, so you don't really know how variable the treatment samples are. You essentially assume that the variability in the control group is the same as in the treatment group.  Thus it might be better to require a minim FC different from zero to be more "conservative".

Looking at the FC of genes that should not be DE according to the biology can be helpful to set such a threshold.

Bernd

0
Entering edit mode

Hi Mike,

sorry, I overlooked the "blind" option and agree: DESeq is most probably rather overestimating the dispersions and the blind option is not appropriate.

@Wannes: the DESeq2 results function has an option lfcThreshold, which is set to 0 by

default. It might be worthwile to look at the fold changes and test against a threshold higher than 0. After all, you have only one treatment sample, so you don't really know how variable the treatment samples are. You essentially assume that the variability in the control group is the same as in the treatment group.  Thus it might be better to require a minim FC different from zero to be more "conservative".

Looking at the FC of genes that should not be DE according to the biology can be helpful to set such a threshold.

Bernd

0
Entering edit mode

Hi

Thank you for the advice! As a test, I ran the analysis with a lfcthreshold of 2 and padj < 0.01, and now I obtained 2000 DEGs out of 18000 genes, which seems more reasonable.

Wannes

0
Entering edit mode

I just have to add, "reasonable" is relative to your precise question. If you want to know, how many genes are not likely to have a fold change of 0, then 10,000+ is reasonable. If you want to know, which genes have very large fold change (>4), then you get a smaller number. In other words, you can make the number as small or large as you like.