Question

Estimating dispersions - deseq2 or other

0

Entering edit mode

green.rebeccam • 0

@greenrebeccam-9713

Last seen 8.2 years ago

Hi All, I am working with an RNAseq data set of 20 samples across 4 geneotypes with increasing severity of the phenotype across the genotypes. One of the questions we are trying to answer is does the variance of gene expression (level of disregulation) change across the groups. I know that DeSeq2 calculates a gene-wise diserpersion estimate. Does anyone have any ideas to use this to calculate a sample dispersion level and to compare that to other samples? - Thanks!!!!

deseq2 • 2.5k views

ADD COMMENT • link updated 8.2 years ago by Gordon Smyth 50k • written 8.2 years ago by green.rebeccam • 0

0

Entering edit mode

So would this be a correct set of code? I have run my original anaylsis in Deseq2. counts should have been generated from Deseq2 summarizeOverlaps. Correct? I see a lot of variance, but it isn't correlated with my genotypes and I am trying to make sure i didn't screw something up.

Then run this through limma

#generate overlaps
se <- summarizeOverlaps(features=ebg, reads=bamfiles, mode="Union", singleEnd=TRUE, ignore.strand=FALSE, preprocess.reads=complementStrand)
meta <- read.csv(“sample_meta_litter.tab")
(colData(se) <- DataFrame(meta))

dds<- DESeq(dds)

design <- model.matrix(se$Category)
voom <- voomWithQualityWeights(counts, design=design, plot=TRUE)
summary (lm(voom$weight ~ se$Category))

ADD REPLY • link 8.2 years ago green.rebeccam • 0

0

Entering edit mode

No, the code isn't correct. Lots of issues. The model.matrix() command will give an error, and the last line probably will also. What are you trying to achieve by regressing voom weights on Category? That doesn't seem a sensible thing to do, and the y and x vectors are different lengths in your command. Where does 'counts' come from? What is the DESeq() command for -- it doesn't seem to do anything here because you don't use the output? If you want to model different dispersions for different groups, you need to specify the var.design argument to voomWithQualityWeights.

I can't quite tell what you're trying to do. Why don't you just follow the usual sort of voom RNA-seq analysis?

ADD REPLY • link 8.2 years ago Gordon Smyth 50k

0

Entering edit mode

I think this might have been intended as a reply to my answer, where I mentioned that I plotted and regressed the weights against covariates to see whether the heteroskedasticity seems related to groups/batches or whether it is more random.

ADD REPLY • link 8.2 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Oh I see. I think this would be a better way to visualize it:

logCPM <- cpm(counts,prior.count=3)
v <- voomaByGroup(logCPM, group=Category, plot=TRUE)

ADD REPLY • link 8.2 years ago Gordon Smyth 50k

0

Entering edit mode

Is voomaByGroup different than using voomWithQualityWeights with the var.design argument?

ADD REPLY • link 8.2 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

The main difference in this context is that it will plot a separate voom curve for every category value, so you can see how different the curves are.

voomWithQualityWeights() can't plot different curves because it handles cases that a more general and can't be summarized as distinct curves. There are other differences, but they are perhaps minor in this context.

ADD REPLY • link 8.2 years ago Gordon Smyth 50k

0

Entering edit mode

sorry, but with is the function cpm?

ADD REPLY • link 8.2 years ago green.rebeccam • 0

0

Entering edit mode

never mind, its part of edgeR... sorrt

ADD REPLY • link 8.2 years ago green.rebeccam • 0

0

Entering edit mode

Yes - I was trying to see if the heteroskedasticity had anything to do with groups. You are correct that I am not using the deseq here for this output, its used other places in my code. I am however, using the count table generated by the summarizeOverlaps.

You are right on the model.matrix, that was a type and missing the ~, but the matrix looks as expected.

ADD REPLY • link updated 8.2 years ago by Gordon Smyth 50k • written 8.2 years ago by green.rebeccam • 0

score 2 · Answer 1 · 2016-02-12

Dispersion has no meaning for a single sample, but you can calculate the dispersion for each group and compare them. Gene-wise dispersion estimates will probably not be sufficiently stable to compare between groups, but you can get a stable measure of the average dispersion of all genes in each group. edgeR will give you this as dge$common.dispersion, and I believe you can get a similar quantity out of DESeq2 by telling it not to use a trend for the prior.

Another potential option is to use limma's voomWithQualityWeights function on your data and then examine the relationship between the computed samples weights and your groups or other covariates. I generally do this with every dataset to identify any possible group-specific quality issues. Here is an example of the plots I produce using this method: http://mneme.homenet.org/~ryan/resume/examples/Salomon/450k/sample-weights.pdf (The p-values are obtained by doing something like summary(lm(weight ~ covariate)) -- essentially a t-test.)

Keep in mind that any analysis of this type necessarily assumes that the technical noise is consistent across all samples, so that any differences in the amount of variation are attributable to the biology of the system.

score 0 · Answer 2 · 2016-02-13

We don't focus on differential variability across groups. If I wanted to test this, I would want to gather many samples (~10-20) for each biological condition. The typical 3-5 samples per biological condition can be sufficient for testing differences in mean but the variance requires more sample size. Take a look at BaySeq, I believe this have the option for estimating different variances/dispersion for each condition.

score 0 · Answer 3 · 2016-02-14

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

See voomWithQualityWeights() in the limma package, which will compute either sample specific or group specific dispersion factors. The method is decribed here:

http://www.ncbi.nlm.nih.gov/pubmed/25925576

PS. I see that Ryan Thompson has already suggested this.

ADD COMMENT • link 8.2 years ago Gordon Smyth 50k