Question: modeling heteroscedasticity in limma-voom for RNA-Seq data analysis
0
2.9 years ago by
Yanzhu Lin120
Yanzhu Lin120 wrote:

Dear All,

How would you model a factor that's heteroscedastic using limma-voom for RNA-Seq data?

Best,

Yanzhu

limma • 1.2k views
modified 2.9 years ago • written 2.9 years ago by Yanzhu Lin120

Hi Gordon,

I never assume that the variances between genes are the same. What we are interested is to model the heteroscedasticity within gene, for example, in the RNA-Seq experiment, there is one factor with 5 levels and each level has several biosamples, if the variance among these 5 levels are different, does the limma-voom take care of this concern? This is what we want to double check from the authors!

Thanks,

Yanzhu

ADD REPLYlink modified 2.9 years ago by Gordon Smyth38k • written 2.9 years ago by Yanzhu Lin120

In addition, in RNAseq data analysis, we think the negative binomial model, which is used by DESeq/edgeR, can also take care of heteroscedasticity, the reason is that since variance = mean + dispersion * squared of mean, if the mean is different, then the variance will also be different. Are we correct? We need to confirm the reviewer, but we can't find any paper clearly mentions this point.

Also, for DESeq/edgeR, both packages model dispersion depends on the mean, so I think when the mean is different, the dispersion is also different, am I correct?

Thanks,

Yanzhu

ADD REPLYlink modified 2.9 years ago by Gordon Smyth38k • written 2.9 years ago by Yanzhu Lin120

edgeR and DESeq2 model a specific kind of heteroskedasticity, in which the variance of a gene depends on the mean abundance. This is modeling a known property of RNA-seq counts in general. Limma also also models the same relationship using voom. However, neither edgeR or DESeq2, nor voom by itself, can properly model the case where some samples are inherently more variable than others regardless of abundance. As Gordon has said, limma can model this case using voomWithQualityWeights. In this case, voom models the heteroskedasticity related to abundance, and the quality weights method models heteroskedasticity between samples. There is a publication for voomWithQualityWeights that you can cite to satisfy the reviewer that you mention: http://nar.oxfordjournals.org/content/early/2015/04/28/nar.gkv412.full

Hi Thompson,

We understand that DESeq/edgeR/limma-voom model the variance depending on the mean values. In addition, according to the voom paper, we also know that limma-voom can model the heteroscedasticity between sample within the same gene, which is what we want. But I was not clear which function in limma package I should use to achieve this goal, this is why posted the question here and I am very sorry that I didn't describe it clear. After Gordon mentioned voomWithQualityWeights(), I am clear about how to model the heterscedasticity between samples within the same gene using limma-voom.

Thanks,

Yanzhu

ADD REPLYlink modified 2.9 years ago by Gordon Smyth38k • written 2.9 years ago by Yanzhu Lin120

Hi Thompson,

I have taken a look at the paper from the link you provided, in the paper, it mentions that: "This approach can also be used when there is a logical grouping of samples based on experimental conditions, and the different conditions are expected to have different variabilities."

this is exactly what we are looking for! Thank you so much!

Best,

Yanzhu

Hi Gordon,

Thank you very much for your comments. All the RNA-seq methods -- edgeR, DESeq2 and voom -- allow the variance to depend on the mean for each gene, this is exactly what I know.

I will try voomWithQualityWeights() next.

Thanks,

Yanzhu

ADD REPLYlink modified 2.9 years ago by Gordon Smyth38k • written 2.9 years ago by Yanzhu Lin120
Answer: modeling heteroscedasticity in limma-voom for RNA-Seq data analysis
4
2.9 years ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

limma never assumes that any factor or variable has the same variance between genes. In other words, everything is assumed to be heteroscedastic. Normally we don't think about variances of factors anyway, only about that of expression values.

I am somewhat puzzled why you would ask this question and why you think that something special needs to be done to model heteroscedasticity. I assume there is something more to your question that you haven't told us.

Later:

All the RNA-seq methods -- edgeR, DESeq2 and voom -- allow the variance to depend on the mean for each gene. There is no substantial differences between the methods in that respect. voom is slightly more flexible in the terms of the mean-variance relationship than the others (when the library sizes are inconsistent), as discussed in the Discussion section of the voom paper in Genome Biology.

If you want to allow variances to depend on an experimental factor, this can be done using voomWithQualityWeights(). See the article cited in the help page for details.

Hi Gordon,

One more question about the design matrix used for option "var.design" in the voomWithQualityWeights(). I am not sure whether I used this function in the correct way and would like to double check with you. For example, if I have three factors: A (four levels), B (three levels), C ( two levels), and there are 12 biosamples in total. I would like to test the main effects and interaction among these three factors, but assume the variability of samples among the different levels of factor A are different. Refer to the supplemental materials for the paper mentioned by Thompson above (http://bioinf.wehi.edu.au/voomWithQualityWeights/simulation4vs3.R), I do the coding for my model in the following way:

design<-A+B+C+A:B+A:C+B:C+A:B:C

Z<-cbind((1,1,1,0,0,0,0,0,0,0,0,0),(0,0,0,1,1,1,0,0,0,0,0,0),(0,0,0,0,0,0,1,1,1,0,0,0),(0,0,0,0,0,0,0,0,0,1,1,1))           ####the samples in different levels of factor A are expected to have different variabilities.

v<-voomWithQualityWeights(count, design,normalization="none", var.design=Z, plot=F)

vfit<-lmFit(v)
vfit<-eBayes(vfit)

A_maineffect<-topTable(vfit,coef=c(2,3,4),number=Inf)

B_maineffect<-topTable(vfit,coef=c(5,6),number=Inf)

C_maineffect<-topTable(vfit,coef=7,number=Inf)

and so on for the interaction terms effect.

Best,

Yanzhu

ADD REPLYlink modified 2.8 years ago by Gordon Smyth38k • written 2.9 years ago by Yanzhu Lin120

That looks correct, but you can simply create the var.design matrix using model.matrix(~A) instead of constructing it manually. However, I believe it's generally recommended to run voomWithQualityWeights without any var.design, which will compute a weight for each sample individually. Run it with plot=TRUE to see the distribution of sample weights. If the sample weights on the plot do not noticeably cluster by group, then using your chosen var.design is unlikely to have much effect.

Hi Thompson,

I would like to do some analysis using the sample weights generated by  voomWithQualityWeights(), is there a way that I could export the sample weights from  voomWithQualityWeights()?

I took a look at supplemental material (http://bioinf.wehi.edu.au/voomWithQualityWeights/mixture/mixtureExperiment-knitr.R) of the paper you mentioned above, I found that vdodgyb$sample was used in barplot, is this vdodgyb$sample the sample weight? I tried vfit$sample in my case, however, it reports NULL! Do you know how I can get sample weight? Thanks, Yanzhu ADD REPLYlink modified 2.8 years ago by Gordon Smyth38k • written 2.8 years ago by Yanzhu Lin120 Hi Thompson, Is$sample the way to get the sample weight?

Thanks,

Yanzhu

ADD REPLYlink modified 2.8 years ago by Gordon Smyth38k • written 2.8 years ago by Yanzhu Lin120

Well, to get the sample weights produced from voomWithQualityWeights, you would naturally need to use the output produced by that function (which you have stored as v) rather than the output from a different function (such as vfit). So you need

v\$sample.weights