I am trying to add a contrasts matrix to a analysis to compare the average of the means of two groups in the primary factor of interest. Just to make things interesting my design matrix has a bunch of unnamed latent variables churned out by SVA. I have noticed one annoying issue could be called a bug, is that all the manuals seem to specify contrasts.fit with out explicitly naming the contrasts argument, but somehow contrasts.fit uses my contrast.matrix as the coefficients argument when I do this, which I figured out by stepping through eBayes and realizing my coefficients look disturbingly like my contrasts matrix. Furthermore, when I don't specify a contrast matrix, I end up with as many coefficients as variables, but when I specify a contrast matrix I end up with as many coefficients as contrasts, it doesn't seem like these can both be right since in one case I am comparing fewer things but have more coefficients. I've tried a number of different contrast matrix including building using cbind instead of makeContrasts as was recommended in the SVA manual.
Anyhow this is my best shot at getting eBayes to generate pvalues with contrasts.
colnames(mods)[8:35]<-make.names(8:35) #makeContrasts doesn't like SVA's degenerate variable names contrast.matrix<-makeContrasts( B="(Intercept)-FacA", S="(Intercept)-FacB", M="(Intercept)-FacC", SB="(Intercept)-(FacA+FacB)/2", levels=mods) #still gives me a waring about the naming of (Intercept). colnames(mods) <- rownames(contrast.matrix) #again with the SVA degeneracy fitS<-lmFit(eset, mods) fit<-eBayes(fitS) fitC <- contrasts.fit(fitS,contrasts= contrast.matrix) efitC<-eBayes(fitC)
Is there some issues in my code I am unaware of that would be causing the zero pvalues?
There are almost certainly issues with your code, but it's not clear from what you have provided what those issues might be.
Some advice. You have said a bunch of stuff, but most of it isn't actually on point, and you are rambling a bit. In addition, you provide some code, but not enough to show exactly what you have done. In order for anybody to help, you will have to clarify what exactly you are trying to do, and exactly what you have done. You will probably do better to limit the verbiage and increase the code.
Also, note that if you have an intercept term, then this is very likely to represent a 'baseline' factor level, and your other coefficients will represent changes from that baseline. In this situation the intercept is not likely to be interesting, and your contrasts are already inherent in your coefficients. In other words, your parameterization probably doesn't require a call to contrasts.fit(), unless you want to make comparisons other than those inherent in your existing coefficients.
Also note that the factors returned by SVA are intended to be 'nuisance' variables, which capture some technical or batch variability, but are themselves not interesting. So the contrasts you appear to be interested in are likely to be a comparison of the most uninteresting aspects of your experiment. You are also unlikely to have 27 (!) such variables. The sva package has a function num.sv() which is intended to detect a reasonable number of surrogate variables for your data. You might want to check that as well.
Maybe it would help if I focused attention on the code I did provide, since I have a decent grasp of SVA as is.
I generated two variables.
fit and efitC
fit is good(enough for now) uses SVA's output just fine.
efitC is bad every p-value is zero.
The difference between these two variables was one used the default contrasts, I believe it is generated by a function "contr.treatment " the other used my provided contrast matrix and the contrasts.fit function.
I also provided some information demonstrating my motivation to solve the problem and some sub problems that I solved. Mainly: the differences in the forms of the coefficients using the two methods.
One thing I am considering is trying to modify the way I call model.matrix to add a contrasts.arg argument, though I haven't seen anyone use that in any vignettes.
OK. Let's assume that you have fit a factor effects model, so the intercept is the baseline level, and facA is one of the (non-SVA) coefficients. If you have no continuous variables then this is a valid assumption. In that case, facA is the difference between the facA group and the baseline. One of your contrasts is
Intercept - facA
which algebraically is
Intercept - (facA - Intercept) = -facA
So that contrast is estimating the (negative) mean of the facA samples. And the hypothesis you are testing is whether or not this is equal to zero. Which even if you haven't filtered out low expressing genes is likely to be rejected for every gene, hence the all-zero p-values.
Steve and I are trying to be helpful, and part of that is to help you accurately and concisely show what the problem is. I can't speak for Steve, but I am not concerned with your motivation, nor am I concerned with your interpretation of why things aren't working. Both are orthogonal to the issue at hand, which is why you aren't getting what you expect.
This is why I asked you to clarify what you are trying to do, and to show the code you used. Unless we know exactly what you are trying to do, and how you are trying to do it, we cannot possibly help. All we can do is make suggestions that may or may not be reasonable.
That was the problem with the simplified problem(See Steve's suggestion). I will try to generalize it to SVA's output myself, thank you. I still get a warning about the "(Intercept)", even though I didn't use the column name or row name anywhere.
So now I want to compare, facA to facB.
if use
Intercept + facA - facB it doesn't seem to work, maybe it is a problem with using the intercept coding?
the contrast matrix has which sums to one.
1
1
-1