Question

Interpretation of fold.changes for covariates in a linear model using voom

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 35 minutes ago

WEHI, Melbourne, Australia

Dear Francesco, I think that you may be mis-judging the nature of a linear model with multiple factors and covariates. Limma computes log-fold-changes by fitting a linear model and extracting the estimated coefficients. The interpretation of each coefficient is that it represents the change in the reponse (log-expression in this case) for each unit change in the covariate, assuming that all other covariates in the model can be held constant. For example, the coefficient for alpha in your hypothetical model is the change in the log-expression of the gene that would occur if alpha was changed (from 0 to 1) but all other factors in the model were held constant. This is not the same as fitting a simple model with just alpha and no other factors. The coefficient that you would get from the simple model with just alpha will not be the same as the coefficient you would get for alpha from the full model. In the simple model, the other factors are not held constant. It would easily be that the full model could give a negative coefficient but the simple model gives a positive coefficient. This occurs when there are other factors in the model that are positively correlated with alpha but have the opposite effect on expression. When you make a simple summary of log-expression values for wild-type KRAS vs. mutated, you are looking at the marginal effect of KRAS when all other factors are allowed to vary as well. In other words, you are examining the simple model with just KRAS as a factor. There is no reason that this should be the same as the logFC for KRAS when all other factors have been adjusted for. The simple effect could easily be driven by factors other than KRAS. You might find it helpful to consult a textbook on multiple regression that discusses the subtleties of unbalanced anova or correlated covariates. Best wishes Gordon > Date: Mon, 14 Apr 2014 06:22:45 -0700 (PDT) > From: "Francesco Gatto [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, gatto at chalmers.se > Subject: [BioC] Interpretation of fold.changes for covariates in a > linear model using voom > > I have implemented a linear model to fit RNAseq read counts for ~1000 > samples using ~200 covariates plus the intercept. The pipeline follows > the example in the user manual, i.e. v=voom(y,design) -> > fit=lmFit(v,design) -> fit2=eBayes(fit). Then I am using decideTests() > to assess the effect of each covariate on gene expression. However, the > output is less complete than topTable(), so I used this function to > obtain values on fold-changes and individual p-values. At this stage, > values become difficult to interpret. If the LM is: > > gene_i,j = intercept_i + alfa_i*Factor1_j + beta_i*Factor2_j + (~200 other factors) > > and, to simplify, let's say covariates are binary. Then what do the > fold-changes for topTable(fit2,coef="alfa") mean? > My interpretation was the up/down-regulation due to the presence of > covariate alfa, but if I boxplot or summarize the expression values for > the two groups (alfa = 1 vs alfa = 0) I get quite contradictory results. > For example this is the most DE gene when the sample has a mutation in > KRAS (the covariate, 0 if wild-type, 1 if mutated): > >> topTable(fit2,coef="metadataKRAS",number=1) > > hgnc_symbol logFC AveExpr t P.Value adj.P.Val B > GOLGA6A -1.314213 -5.9000494 -7.375175 3.800569e-13 7.204358e-09 18.80961 > > The summary of gene expression values for wild-type KRAS vs. mutated is: > >> by(E["GOLGA6A",],design[,"metadataKRAS"],summary) > > design[, "metadataKRAS"]: 0 > Min. 1st Qu. Median Mean 3rd Qu. Max. > -7.882 -6.865 -6.461 -6.021 -5.770 1.548 > ---------------------------------------------------------------- > design[, "metadataKRAS"]: 1 > Min. 1st Qu. Median Mean 3rd Qu. Max. > -7.657 -6.247 -5.487 -5.239 -4.582 -1.311 > > This is an up-regulation, and of magnitude way lower than the one > indicated by topTable. Is this because I am misusing the function > topTable? I cannot wrap my head around this. Thanks for your help! > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] sv_SE.UTF-8/sv_SE.UTF-8/sv_SE.UTF-8/C/sv_SE.UTF-8/sv_SE.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.4.2 limma_3.18.9 > > loaded via a namespace (and not attached): > [1] tools_3.0.2 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

RNASeq RNASeq • 3.7k views

ADD COMMENT • link 10.0 years ago Gordon Smyth 50k