Estimating dispersions - deseq2 or other
3
0
Entering edit mode
@greenrebeccam-9713
Last seen 8.2 years ago

Hi All, I am working with an RNAseq data set of 20 samples across 4 geneotypes with increasing severity of the phenotype across the genotypes. One of the questions we are trying to answer is does the variance of gene expression (level of disregulation) change across the groups. I know that DeSeq2 calculates a gene-wise diserpersion estimate. Does anyone have any ideas to use this to calculate a sample dispersion level and to compare that to other samples? - Thanks!!!!

deseq2 • 2.5k views
ADD COMMENT
0
Entering edit mode

So would this be a correct set of code? I have run my original anaylsis in Deseq2. counts should have been generated from Deseq2 summarizeOverlaps. Correct? I see a lot of variance, but it isn't correlated with my genotypes and I am trying to make sure i didn't screw something up. 

Then run this through limma

#generate overlaps
se <- summarizeOverlaps(features=ebg, reads=bamfiles, mode="Union", singleEnd=TRUE, ignore.strand=FALSE, preprocess.reads=complementStrand)
meta <- read.csv(“sample_meta_litter.tab")
(colData(se) <- DataFrame(meta))

dds<- DESeq(dds)

design <- model.matrix(se$Category)
voom <- voomWithQualityWeights(counts, design=design, plot=TRUE)
summary (lm(voom$weight ~ se$Category))

ADD REPLY
0
Entering edit mode

No, the code isn't correct. Lots of issues. The model.matrix() command will give an error, and the last line probably will also. What are you trying to achieve by regressing voom weights on Category? That doesn't seem a sensible thing to do, and the y and x vectors are different lengths in your command. Where does 'counts' come from? What is the DESeq() command for -- it doesn't seem to do anything here because you don't use the output? If you want to model different dispersions for different groups, you need to specify the var.design argument to voomWithQualityWeights.

I can't quite tell what you're trying to do. Why don't you just follow the usual sort of voom RNA-seq analysis?

ADD REPLY
0
Entering edit mode

I think this might have been intended as a reply to my answer, where I mentioned that I plotted and regressed the weights against covariates to see whether the heteroskedasticity seems related to groups/batches or whether it is more random.

ADD REPLY
0
Entering edit mode

Oh I see. I think this would be a better way to visualize it:

logCPM <- cpm(counts,prior.count=3)
v <- voomaByGroup(logCPM, group=Category, plot=TRUE)
ADD REPLY
0
Entering edit mode

Is voomaByGroup different than using voomWithQualityWeights with the var.design argument?

ADD REPLY
0
Entering edit mode

The main difference in this context is that it will plot a separate voom curve for every category value, so you can see how different the curves are.

voomWithQualityWeights() can't plot different curves because it handles cases that a more general and can't be summarized as distinct curves. There are other differences, but they are perhaps minor in this context.

ADD REPLY
0
Entering edit mode

sorry, but with is the function cpm?

ADD REPLY
0
Entering edit mode

never mind, its part of edgeR... sorrt

ADD REPLY
0
Entering edit mode

Yes - I was trying to see if the heteroskedasticity had anything to do with groups. You are correct that I am not using the deseq here for this output, its used other places in my code. I am however, using the count table generated by the summarizeOverlaps. 

You are right on the model.matrix, that was a type and missing the ~, but the matrix looks as expected.

ADD REPLY
2
Entering edit mode
@ryan-c-thompson-5618
Last seen 8 months ago
Scripps Research, La Jolla, CA

Dispersion has no meaning for a single sample, but you can calculate the dispersion for each group and compare them. Gene-wise dispersion estimates will probably not be sufficiently stable to compare between groups, but you can get a stable measure of the average dispersion of all genes in each group. edgeR will give you this as dge$common.dispersion, and I believe you can get a similar quantity out of DESeq2 by telling it not to use a trend for the prior.

Another potential option is to use limma's voomWithQualityWeights function on your data and then examine the relationship between the computed samples weights and your groups or other covariates. I generally do this with every dataset to identify any possible group-specific quality issues. Here is an example of the plots I produce using this method: http://mneme.homenet.org/~ryan/resume/examples/Salomon/450k/sample-weights.pdf (The p-values are obtained by doing something like summary(lm(weight ~ covariate)) -- essentially a t-test.)

Keep in mind that any analysis of this type necessarily assumes that the technical noise is consistent across all samples, so that any differences in the amount of variation are attributable to the biology of the system.

ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 5 hours ago
United States

We don't focus on differential variability across groups. If I wanted to test this, I would want to gather many samples (~10-20) for each biological condition. The typical 3-5 samples per biological condition can be sufficient for testing differences in mean but the variance requires more sample size. Take a look at BaySeq, I believe this have the option for estimating different variances/dispersion for each condition.

ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

See voomWithQualityWeights() in the limma package, which will compute either sample specific or group specific dispersion factors. The method is decribed here:

  http://www.ncbi.nlm.nih.gov/pubmed/25925576

PS. I see that Ryan Thompson has already suggested this.

ADD COMMENT

Login before adding your answer.

Traffic: 776 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6