sva + voom + limma
2
0
Entering edit mode
@meritxell-oliva-6129
Last seen 10.2 years ago
Dear Bioconductor list, Dear Jeff Leek & Gordon Smith, I want to use sva() to estimate potential surrogate variables of a RNASeq derived expression dataset, as a previous step to perform differential gene expression analysis with limma(), previously using voom() to transform RNASeq to microarray-like expression data. As far as I know, SVA was originally designed to deal with ( normally distibuted ) microarray expression data, but can also be used to work with RNSeq data. Please, correct me if I am wrong here! So, I first transform the raw counts into cpm-log2 values, by using edgeR function calcNormFactors() and voom(). I apply sva() on the transformed dataset to compute the surrogate variables. Then, I build the design matrix to create a linear model with my primary variable of interest (InvGeno, a quantitative discrete variable: 0,1,2 ) and the set of surrogate variables, and I finally apply voom()+lmFit()+eBayes() to obtain DE candidates: ### y <- calcNormFactors(rawCounts_epression_dataset_matrix); mod1 <- model.matrix(~InvGeno); mod0 <- model.matrix(~1,data=InvGeno); v <- voom(counts=y, design = mod1) sva.obj <- sva(v$E, mod1, mod0,method="irw",n.sv=10); mod1 <- model.matrix(~InvGeno+sva.obj$sv); v <- voom(counts=y, design = mod1); fit.obj <- lmFit(v, design); fit.obj <- eBayes(fit,trend=TRUE); ### As voom needs to be fed by raw counts and performs the cpm+log2 steps internally, I am not sure about the properness of including in the linear model the sva-computed surrogate variables from cpm-log2 values, and the implications that this step may produce in the DE analysis. Could you suggest an appropriate strategy so as to achieve my purposes? Thanks a lot!!! Meritxell Oliva PhD student IBB (Biotechnology and Biomedicine Institute) Comparative and Functional Genomics group Campus Universitari - 08193 Bellaterra Cerdanyola del Vallès - Barcelona [[alternative HTML version deleted]]
RNASeq Microarray sva RNASeq Microarray sva • 5.6k views
ADD COMMENT
0
Entering edit mode
Jeff Leek ▴ 650
@jeff-leek-5015
Last seen 3.8 years ago
United States
Hi Meritxell, SVA is designed to deal with symmetrically distributed data (like normal data -but it can also reasonably handle most approximately symmetrical distributions). It appears, from your description, that you have applied sva properly. I'll leave it to the authors of voom/limma to address the second question. The svs should be on the correct scale as long as the data transformation takes place before the linear model is fit. Best, Jeff On Mon, Sep 2, 2013 at 1:35 PM, Meritxell Oliva <meritxellop@gmail.com>wrote: > Dear Bioconductor list, > Dear Jeff Leek & Gordon Smith, > I want to use sva() to estimate potential surrogate variables of a RNASeq > derived expression dataset, as a previous step to perform differential gene > expression analysis with limma(), previously using voom() to transform > RNASeq to microarray-like expression data. > As far as I know, SVA was originally designed to deal with ( normally > distibuted ) microarray expression data, but can also be used to work with > RNSeq data. Please, correct me if I am wrong here! > So, I first transform the raw counts into cpm-log2 values, by using edgeR > function calcNormFactors() and voom(). I apply sva() on the transformed > dataset to compute the surrogate variables. Then, I build the design matrix > to create a linear model with my primary variable of interest (InvGeno, a > quantitative discrete variable: 0,1,2 ) and the set of surrogate variables, > and I finally apply voom()+lmFit()+eBayes() to obtain DE candidates: > ### > y <- calcNormFactors(rawCounts_epression_dataset_matrix); > mod1 <- model.matrix(~InvGeno); > mod0 <- model.matrix(~1,data=InvGeno); > v <- voom(counts=y, design = mod1) > sva.obj <- sva(v$E, mod1, mod0,method="irw",n.sv=10); > mod1 <- model.matrix(~InvGeno+sva.obj$sv); > v <- voom(counts=y, design = mod1); > fit.obj <- lmFit(v, design); > fit.obj <- eBayes(fit,trend=TRUE); > ### > As voom needs to be fed by raw counts and performs the cpm+log2 steps > internally, I am not sure about the properness of including in the linear > model the sva-computed surrogate variables from cpm-log2 values, and the > implications that this step may produce in the DE analysis. > Could you suggest an appropriate strategy so as to achieve my purposes? > Thanks a lot!!! > > Meritxell Oliva > PhD student > IBB (Biotechnology and Biomedicine Institute) > Comparative and Functional Genomics group > Campus Universitari - 08193 Bellaterra Cerdanyola del Vallès - Barcelona > > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia
> Date: Mon, 2 Sep 2013 19:35:07 +0200 > From: Meritxell Oliva <meritxellop at="" gmail.com=""> > To: bioconductor at r-project.org > Cc: Mario C?ceres <mcaceres at="" icrea.cat=""> > Subject: [BioC] sva + voom + limma > > Dear Bioconductor list, > Dear Jeff Leek & Gordon Smith, > I want to use sva() to estimate potential surrogate variables of a RNASeq derived expression dataset, as a previous step to perform differential gene expression analysis with limma(), previously using voom() to transform RNASeq to microarray-like expression data. > As far as I know, SVA was originally designed to deal with ( normally distibuted ) microarray expression data, but can also be used to work with RNSeq data. Please, correct me if I am wrong here! > So, I first transform the raw counts into cpm-log2 values, by using edgeR function calcNormFactors() and voom(). I apply sva() on the transformed dataset to compute the surrogate variables. Then, I build the design matrix to create a linear model with my primary variable of interest (InvGeno, a quantitative discrete variable: 0,1,2 ) and the set of surrogate variables, and I finally apply voom()+lmFit()+eBayes() to obtain DE candidates: > ### > y <- calcNormFactors(rawCounts_epression_dataset_matrix); > mod1 <- model.matrix(~InvGeno); > mod0 <- model.matrix(~1,data=InvGeno); > v <- voom(counts=y, design = mod1) > sva.obj <- sva(v$E, mod1, mod0,method="irw",n.sv=10); It's a pity that sva() can't use weights, but I can't suggest anything better. > mod1 <- model.matrix(~InvGeno+sva.obj$sv); > v <- voom(counts=y, design = mod1); > fit.obj <- lmFit(v, design); I think you mean design=mod1. > fit.obj <- eBayes(fit,trend=TRUE); trend=TRUE isn't needed. voom() already handles the trend. Gordon > ### > As voom needs to be fed by raw counts and performs the cpm+log2 steps internally, I am not sure about the properness of including in the linear model the sva-computed surrogate variables from cpm-log2 values, and the implications that this step may produce in the DE analysis. > Could you suggest an appropriate strategy so as to achieve my purposes? > Thanks a lot!!! > > Meritxell Oliva > PhD student > IBB (Biotechnology and Biomedicine Institute) > Comparative and Functional Genomics group > Campus Universitari - 08193 Bellaterra Cerdanyola del Vall?s - Barcelona > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
ADD COMMENT

Login before adding your answer.

Traffic: 536 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6