Question: sva + voom + limma
0
5.8 years ago by
Meritxell Oliva70 wrote:
Dear Bioconductor list, Dear Jeff Leek & Gordon Smith, I want to use sva() to estimate potential surrogate variables of a RNASeq derived expression dataset, as a previous step to perform differential gene expression analysis with limma(), previously using voom() to transform RNASeq to microarray-like expression data. As far as I know, SVA was originally designed to deal with ( normally distibuted ) microarray expression data, but can also be used to work with RNSeq data. Please, correct me if I am wrong here! So, I first transform the raw counts into cpm-log2 values, by using edgeR function calcNormFactors() and voom(). I apply sva() on the transformed dataset to compute the surrogate variables. Then, I build the design matrix to create a linear model with my primary variable of interest (InvGeno, a quantitative discrete variable: 0,1,2 ) and the set of surrogate variables, and I finally apply voom()+lmFit()+eBayes() to obtain DE candidates: ### y <- calcNormFactors(rawCounts_epression_dataset_matrix); mod1 <- model.matrix(~InvGeno); mod0 <- model.matrix(~1,data=InvGeno); v <- voom(counts=y, design = mod1) sva.obj <- sva(v$E, mod1, mod0,method="irw",n.sv=10); mod1 <- model.matrix(~InvGeno+sva.obj$sv); v <- voom(counts=y, design = mod1); fit.obj <- lmFit(v, design); fit.obj <- eBayes(fit,trend=TRUE); ### As voom needs to be fed by raw counts and performs the cpm+log2 steps internally, I am not sure about the properness of including in the linear model the sva-computed surrogate variables from cpm-log2 values, and the implications that this step may produce in the DE analysis. Could you suggest an appropriate strategy so as to achieve my purposes? Thanks a lot!!! Meritxell Oliva PhD student IBB (Biotechnology and Biomedicine Institute) Comparative and Functional Genomics group Campus Universitari - 08193 Bellaterra Cerdanyola del Vallès - Barcelona [[alternative HTML version deleted]]
rnaseq microarray sva • 3.0k views
modified 5.8 years ago by Gordon Smyth37k • written 5.8 years ago by Meritxell Oliva70
Answer: sva + voom + limma
0
5.8 years ago by
Jeff Leek550
United States
Jeff Leek550 wrote:
Hi Meritxell, SVA is designed to deal with symmetrically distributed data (like normal data -but it can also reasonably handle most approximately symmetrical distributions). It appears, from your description, that you have applied sva properly. I'll leave it to the authors of voom/limma to address the second question. The svs should be on the correct scale as long as the data transformation takes place before the linear model is fit. Best, Jeff On Mon, Sep 2, 2013 at 1:35 PM, Meritxell Oliva <meritxellop@gmail.com>wrote: > Dear Bioconductor list, > Dear Jeff Leek & Gordon Smith, > I want to use sva() to estimate potential surrogate variables of a RNASeq > derived expression dataset, as a previous step to perform differential gene > expression analysis with limma(), previously using voom() to transform > RNASeq to microarray-like expression data. > As far as I know, SVA was originally designed to deal with ( normally > distibuted ) microarray expression data, but can also be used to work with > RNSeq data. Please, correct me if I am wrong here! > So, I first transform the raw counts into cpm-log2 values, by using edgeR > function calcNormFactors() and voom(). I apply sva() on the transformed > dataset to compute the surrogate variables. Then, I build the design matrix > to create a linear model with my primary variable of interest (InvGeno, a > quantitative discrete variable: 0,1,2 ) and the set of surrogate variables, > and I finally apply voom()+lmFit()+eBayes() to obtain DE candidates: > ### > y <- calcNormFactors(rawCounts_epression_dataset_matrix); > mod1 <- model.matrix(~InvGeno); > mod0 <- model.matrix(~1,data=InvGeno); > v <- voom(counts=y, design = mod1) > sva.obj <- sva(v$E, mod1, mod0,method="irw",n.sv=10); > mod1 <- model.matrix(~InvGeno+sva.obj$sv); > v <- voom(counts=y, design = mod1); > fit.obj <- lmFit(v, design); > fit.obj <- eBayes(fit,trend=TRUE); > ### > As voom needs to be fed by raw counts and performs the cpm+log2 steps > internally, I am not sure about the properness of including in the linear > model the sva-computed surrogate variables from cpm-log2 values, and the > implications that this step may produce in the DE analysis. > Could you suggest an appropriate strategy so as to achieve my purposes? > Thanks a lot!!! > > Meritxell Oliva > PhD student > IBB (Biotechnology and Biomedicine Institute) > Comparative and Functional Genomics group > Campus Universitari - 08193 Bellaterra Cerdanyola del Vallès - Barcelona > > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
Answer: sva + voom + limma
0
5.8 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:
> Date: Mon, 2 Sep 2013 19:35:07 +0200 > From: Meritxell Oliva <meritxellop at="" gmail.com=""> > To: bioconductor at r-project.org > Cc: Mario C?ceres <mcaceres at="" icrea.cat=""> > Subject: [BioC] sva + voom + limma > > Dear Bioconductor list, > Dear Jeff Leek & Gordon Smith, > I want to use sva() to estimate potential surrogate variables of a RNASeq derived expression dataset, as a previous step to perform differential gene expression analysis with limma(), previously using voom() to transform RNASeq to microarray-like expression data. > As far as I know, SVA was originally designed to deal with ( normally distibuted ) microarray expression data, but can also be used to work with RNSeq data. Please, correct me if I am wrong here! > So, I first transform the raw counts into cpm-log2 values, by using edgeR function calcNormFactors() and voom(). I apply sva() on the transformed dataset to compute the surrogate variables. Then, I build the design matrix to create a linear model with my primary variable of interest (InvGeno, a quantitative discrete variable: 0,1,2 ) and the set of surrogate variables, and I finally apply voom()+lmFit()+eBayes() to obtain DE candidates: > ### > y <- calcNormFactors(rawCounts_epression_dataset_matrix); > mod1 <- model.matrix(~InvGeno); > mod0 <- model.matrix(~1,data=InvGeno); > v <- voom(counts=y, design = mod1) > sva.obj <- sva(v$E, mod1, mod0,method="irw",n.sv=10); It's a pity that sva() can't use weights, but I can't suggest anything better. > mod1 <- model.matrix(~InvGeno+sva.obj$sv); > v <- voom(counts=y, design = mod1); > fit.obj <- lmFit(v, design); I think you mean design=mod1. > fit.obj <- eBayes(fit,trend=TRUE); trend=TRUE isn't needed. voom() already handles the trend. Gordon > ### > As voom needs to be fed by raw counts and performs the cpm+log2 steps internally, I am not sure about the properness of including in the linear model the sva-computed surrogate variables from cpm-log2 values, and the implications that this step may produce in the DE analysis. > Could you suggest an appropriate strategy so as to achieve my purposes? > Thanks a lot!!! > > Meritxell Oliva > PhD student > IBB (Biotechnology and Biomedicine Institute) > Comparative and Functional Genomics group > Campus Universitari - 08193 Bellaterra Cerdanyola del Vall?s - Barcelona > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}