Question: How do use disperions estimated from my within group estimates to calculate differential expression between groups?
0
3.4 years ago by
rna seq90
rna seq90 wrote:

Hello,

I have 8 samples. 4 control and 4 treated. I can obtain a within sample dispersion estimate if I make a dds object using only the control samples.

condition <- factor(c(rep("Control1", 2), rep("Control2", 2)))

And obtain dispersion estimate from this WITHIN group comparison.

I would like to use the dispersion estimates obtained from this WITHIN group comparison and apply it to the BETWEEN  group comparison

condition <- factor(c(rep("Control", 4), rep("Treatment", 4)))

in order to perform differential expression analysis on the BETWEEN  group comparison using the WITHIN group dispersion estimate.

Can someone tell me how to accomplish this.

TIA.

deseq2 estimatedispersions • 645 views
modified 3.4 years ago • written 3.4 years ago by rna seq90
Answer: How do use disperions estimated from my within group estimates to calculate diff
0
3.4 years ago by
Michael Love26k
United States
Michael Love26k wrote:
Can you explain why you want to do this, rather than using the standard dispersion estimates of DESeq2 which takes into account all of the samples and their expected counts given the design? You may end up with some spurious results if you ignore the variability of the treated samples.
Answer: How do use disperions estimated from my within group estimates to calculate diff
0
3.4 years ago by
rna seq90
rna seq90 wrote:

Hi Michael,

The majority of my genes are differentially expressed between conditions.

If I plot the estimated dispersion WITHIN conditions either treated or control, my dispersions are much less then when I include all samples both WITHIN and BETWEEN groups.

Perhaps I am missing something here/not understanding the model, but it seems reasonable to estimate the biological variance of a gene using only the within group variance of a gene especially if you have enough samples to perform this estimate?

hi,

So you haven't posted your code, but typically, when you include a design formula that specifies which samples are in which group then the dispersion estimates are based off of the within group variance.

You should post all your code and your sessionInfo() so I and others can determine if there is an error.

Hi,

Here is my code:

#b_frame<-na.omit(b_frame)
condition <- factor(c(rep("A", 3), rep("B", 3)))
##create column data
coldata <- data.frame(row.names=colnames(b_frame), condition)

##make DESeqDataSetFromMatrix
dds<-DESeqDataSetFromMatrix(countData=b_frame, colData=coldata, design=~condition)
dds<-estimateSizeFactors(dds)
dds <- DESeq(dds)
res <- results(dds)
table(res$padj<0.05) ## Order by adjusted p-value res <- res[order(res$padj), ]
## Merge with normalized count data

resdata <- merge(as.data.frame(res), as.data.frame(counts(dds, normalized=TRUE)), by="row.names", sort=FALSE)
names(resdata)[1] <- "Gene"
write.csv(resdata, file="A1_Jenning_DESeq2.csv")

##AND sessionInfo()

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

l

Previously you said you had 8 samples, 4 control and 4 treated, and within control there was some grouping of 2 vs 2. Here you have 3 control and 3 treated.

The point is, it matters which dispersion is estimated. If there are differences among the control and treated you can account for that with a batch term in the design. But I'm still not sure exactly how many samples you have and what is the difference between control1 and control2, so I can't give concrete advice.

Let us say I have 8 samples. The first 4 are control and the second 4 are treatment.

#b_frame<-na.omit(b_frame)
condition <- factor(c(rep("control", 4), rep("treatment", 4)))
##create column data
coldata <- data.frame(row.names=colnames(b_frame), condition)

##make DESeqDataSetFromMatrix
dds1<-DESeqDataSetFromMatrix(countData=b_frame, colData=coldata, design=~condition)
dds1<-estimateSizeFactors(dds1)
dds1 <- DESeq(dds1)

"I am not sure if I am accounting for the differences with a batch term here, I am assuming the condition term did this automatically?"

From the dds1 object I get the estimated dispersions.

I was curious to see what the estimated dispersions were from this file so I split the control samples in half so control1 is the first 2 samples of the original control and control2 is the second 2 samples of the original control and performed the same analysis

c_frame<-b_frame[,1:4]
condition <- factor(c(rep("control1", 2), rep("control2", 2)))
##create column data
coldata <- data.frame(row.names=colnames(b_frame), condition)

##make DESeqDataSetFromMatrix
dds1<-DESeqDataSetFromMatrix(countData=c_frame, colData=coldata, design=~condition)
dds2<-estimateSizeFactors(dds2)
dds2 <- DESeq(dds2)

When I compare the estimatedDispersions from dds2 to dds1 the estimatedDispersions are much smaller in dds2.

Thanks

I'm sorry, I'm having a hard time following this question with the hypotheticals. The dispersion estimation depends on which samples you provide. If you provide the software with control and treated samples, it estimates using the within group variance of those samples. If you only provide it the control samples, and create an artificial division among those, it will use the within-(artificial)-group variance of those samples. If the control samples have less variability than the treatment samples, then the second estimate will be lower (although this is not the proper way to estimate control sample variability, because you've made an artificial division among these samples). Still, any lower estimate you derive from the control samples alone shouldn't be used to compare control and treated, because, as I've said previously, you want the model to use the variability of all the samples in the comparison, not a subset.

WIll the following setup estimate within group variances for the control and treatment?

library(DESeq2)

condition <- factor(c(rep("control", 4), rep("treatment", 4)))

coldata <- data.frame(row.names=colnames(c_frame), condition)

dds<-DESeqDataSetFromMatrix(countData=c_frame, colData=coldata, design=~condition)

dds<-estimateSizeFactors(dds)

dds <- DESeq(dds)

res <- results(dds)

Also, if I have a dataset the has a greater fold change dynamic range than RNAseq but also has a larger variance, is there a specific estimateDispersions parameter to use? Is there an appropriate test to use with the results functions?

Thanks for your patience and help.

"WIll the following setup estimate within group variances for the control and treatment?"

yes.

"Also, if I have a dataset the has a greater fold change dynamic range than RNAseq but also has a larger variance, is there a specific estimateDispersions parameter to use? Is there an appropriate test to use with the results functions?"

You don't have to do anything special. The DESeq() functions will handle estimation regardless of the range of fold changes and within-group variance.

Answer: How do use disperions estimated from my within group estimates to calculate diff
0
3.4 years ago by
rna seq90
rna seq90 wrote: