Dear All,
How would you model a factor that's heteroscedastic using limma-voom for RNA-Seq data?
Best,
Yanzhu
Dear All,
How would you model a factor that's heteroscedastic using limma-voom for RNA-Seq data?
Best,
Yanzhu
limma never assumes that any factor or variable has the same variance between genes. In other words, everything is assumed to be heteroscedastic. Normally we don't think about variances of factors anyway, only about that of expression values.
I am somewhat puzzled why you would ask this question and why you think that something special needs to be done to model heteroscedasticity. I assume there is something more to your question that you haven't told us.
Later:
All the RNA-seq methods -- edgeR, DESeq2 and voom -- allow the variance to depend on the mean for each gene. There is no substantial differences between the methods in that respect. voom is slightly more flexible in the terms of the mean-variance relationship than the others (when the library sizes are inconsistent), as discussed in the Discussion section of the voom paper in Genome Biology.
If you want to allow variances to depend on an experimental factor, this can be done using voomWithQualityWeights(). See the article cited in the help page for details.
Hi Gordon,
One more question about the design matrix used for option "var.design" in the voomWithQualityWeights(). I am not sure whether I used this function in the correct way and would like to double check with you. For example, if I have three factors: A (four levels), B (three levels), C ( two levels), and there are 12 biosamples in total. I would like to test the main effects and interaction among these three factors, but assume the variability of samples among the different levels of factor A are different. Refer to the supplemental materials for the paper mentioned by Thompson above (http://bioinf.wehi.edu.au/voomWithQualityWeights/simulation4vs3.R), I do the coding for my model in the following way:
design<-A+B+C+A:B+A:C+B:C+A:B:C
Z<-cbind((1,1,1,0,0,0,0,0,0,0,0,0),(0,0,0,1,1,1,0,0,0,0,0,0),(0,0,0,0,0,0,1,1,1,0,0,0),(0,0,0,0,0,0,0,0,0,1,1,1)) ####the samples in different levels of factor A are expected to have different variabilities.
v<-voomWithQualityWeights(count, design,normalization="none", var.design=Z, plot=F)
vfit<-lmFit(v)
vfit<-eBayes(vfit)
A_maineffect<-topTable(vfit,coef=c(2,3,4),number=Inf)
B_maineffect<-topTable(vfit,coef=c(5,6),number=Inf)
C_maineffect<-topTable(vfit,coef=7,number=Inf)
and so on for the interaction terms effect.
Best,
Yanzhu
That looks correct, but you can simply create the var.design matrix using model.matrix(~A) instead of constructing it manually. However, I believe it's generally recommended to run voomWithQualityWeights without any var.design, which will compute a weight for each sample individually. Run it with plot=TRUE to see the distribution of sample weights. If the sample weights on the plot do not noticeably cluster by group, then using your chosen var.design is unlikely to have much effect.
Hi Thompson,
I would like to do some analysis using the sample weights generated by voomWithQualityWeights(), is there a way that I could export the sample weights from voomWithQualityWeights()?
I took a look at supplemental material (http://bioinf.wehi.edu.au/voomWithQualityWeights/mixture/mixtureExperiment-knitr.R) of the paper you mentioned above, I found that vdodgyb$sample was used in barplot, is this vdodgyb$sample the sample weight? I tried vfit$sample in my case, however, it reports NULL! Do you know how I can get sample weight?
Thanks,
Yanzhu
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi Gordon,
I never assume that the variances between genes are the same. What we are interested is to model the heteroscedasticity within gene, for example, in the RNA-Seq experiment, there is one factor with 5 levels and each level has several biosamples, if the variance among these 5 levels are different, does the limma-voom take care of this concern? This is what we want to double check from the authors!
Thanks,
Yanzhu
In addition, in RNAseq data analysis, we think the negative binomial model, which is used by DESeq/edgeR, can also take care of heteroscedasticity, the reason is that since variance = mean + dispersion * squared of mean, if the mean is different, then the variance will also be different. Are we correct? We need to confirm the reviewer, but we can't find any paper clearly mentions this point.
Also, for DESeq/edgeR, both packages model dispersion depends on the mean, so I think when the mean is different, the dispersion is also different, am I correct?
Thanks,
Yanzhu
edgeR and DESeq2 model a specific kind of heteroskedasticity, in which the variance of a gene depends on the mean abundance. This is modeling a known property of RNA-seq counts in general. Limma also also models the same relationship using voom. However, neither edgeR or DESeq2, nor voom by itself, can properly model the case where some samples are inherently more variable than others regardless of abundance. As Gordon has said, limma can model this case using voomWithQualityWeights. In this case, voom models the heteroskedasticity related to abundance, and the quality weights method models heteroskedasticity between samples. There is a publication for voomWithQualityWeights that you can cite to satisfy the reviewer that you mention: http://nar.oxfordjournals.org/content/early/2015/04/28/nar.gkv412.full
Hi Thompson,
Thank you very much for your reply and your reply is really helpful for us. I really appreciate it.
We understand that DESeq/edgeR/limma-voom model the variance depending on the mean values. In addition, according to the voom paper, we also know that limma-voom can model the heteroscedasticity between sample within the same gene, which is what we want. But I was not clear which function in limma package I should use to achieve this goal, this is why posted the question here and I am very sorry that I didn't describe it clear. After Gordon mentioned voomWithQualityWeights(), I am clear about how to model the heterscedasticity between samples within the same gene using limma-voom.
Thanks,
Yanzhu
Hi Thompson,
I have taken a look at the paper from the link you provided, in the paper, it mentions that: "This approach can also be used when there is a logical grouping of samples based on experimental conditions, and the different conditions are expected to have different variabilities."
this is exactly what we are looking for! Thank you so much!
Best,
Yanzhu
Hi Gordon,
Thank you very much for your comments. All the RNA-seq methods -- edgeR, DESeq2 and voom -- allow the variance to depend on the mean for each gene, this is exactly what I know.
I will try voomWithQualityWeights() next.
Thanks,
Yanzhu