sva + voom + limma

0

Entering edit mode

Meritxell Oliva ▴ 120

@meritxell-oliva-6129

Last seen 9.6 years ago

Dear Bioconductor list, Dear Jeff Leek & Gordon Smith, I want to use sva() to estimate potential surrogate variables of a RNASeq derived expression dataset, as a previous step to perform differential gene expression analysis with limma(), previously using voom() to transform RNASeq to microarray-like expression data. As far as I know, SVA was originally designed to deal with ( normally distibuted ) microarray expression data, but can also be used to work with RNSeq data. Please, correct me if I am wrong here! So, I first transform the raw counts into cpm-log2 values, by using edgeR function calcNormFactors() and voom(). I apply sva() on the transformed dataset to compute the surrogate variables. Then, I build the design matrix to create a linear model with my primary variable of interest (InvGeno, a quantitative discrete variable: 0,1,2 ) and the set of surrogate variables, and I finally apply voom()+lmFit()+eBayes() to obtain DE candidates: ### y <- calcNormFactors(rawCounts_epression_dataset_matrix); mod1 <- model.matrix(~InvGeno); mod0 <- model.matrix(~1,data=InvGeno); v <- voom(counts=y, design = mod1) sva.obj <- sva(v$E, mod1, mod0,method="irw",n.sv=10); mod1 <- model.matrix(~InvGeno+sva.obj$sv); v <- voom(counts=y, design = mod1); fit.obj <- lmFit(v, design); fit.obj <- eBayes(fit,trend=TRUE); ### As voom needs to be fed by raw counts and performs the cpm+log2 steps internally, I am not sure about the properness of including in the linear model the sva-computed surrogate variables from cpm-log2 values, and the implications that this step may produce in the DE analysis. Could you suggest an appropriate strategy so as to achieve my purposes? Thanks a lot!!! Meritxell Oliva PhD student IBB (Biotechnology and Biomedicine Institute) Comparative and Functional Genomics group Campus Universitari - 08193 Bellaterra Cerdanyola del Vallès - Barcelona [[alternative HTML version deleted]]

RNASeq Microarray sva RNASeq Microarray sva • 5.2k views

ADD COMMENT • link updated 10.6 years ago by Gordon Smyth 50k • written 10.6 years ago by Meritxell Oliva ▴ 120

0

Entering edit mode

Jeff Leek ▴ 650

@jeff-leek-5015

Last seen 3.1 years ago

United States

Hi Meritxell, SVA is designed to deal with symmetrically distributed data (like normal data -but it can also reasonably handle most approximately symmetrical distributions). It appears, from your description, that you have applied sva properly. I'll leave it to the authors of voom/limma to address the second question. The svs should be on the correct scale as long as the data transformation takes place before the linear model is fit. Best, Jeff On Mon, Sep 2, 2013 at 1:35 PM, Meritxell Oliva <meritxellop@gmail.com>wrote: > Dear Bioconductor list, > Dear Jeff Leek & Gordon Smith, > I want to use sva() to estimate potential surrogate variables of a RNASeq > derived expression dataset, as a previous step to perform differential gene > expression analysis with limma(), previously using voom() to transform > RNASeq to microarray-like expression data. > As far as I know, SVA was originally designed to deal with ( normally > distibuted ) microarray expression data, but can also be used to work with > RNSeq data. Please, correct me if I am wrong here! > So, I first transform the raw counts into cpm-log2 values, by using edgeR > function calcNormFactors() and voom(). I apply sva() on the transformed > dataset to compute the surrogate variables. Then, I build the design matrix > to create a linear model with my primary variable of interest (InvGeno, a > quantitative discrete variable: 0,1,2 ) and the set of surrogate variables, > and I finally apply voom()+lmFit()+eBayes() to obtain DE candidates: > ### > y <- calcNormFactors(rawCounts_epression_dataset_matrix); > mod1 <- model.matrix(~InvGeno); > mod0 <- model.matrix(~1,data=InvGeno); > v <- voom(counts=y, design = mod1) > sva.obj <- sva(v$E, mod1, mod0,method="irw",n.sv=10); > mod1 <- model.matrix(~InvGeno+sva.obj$sv); > v <- voom(counts=y, design = mod1); > fit.obj <- lmFit(v, design); > fit.obj <- eBayes(fit,trend=TRUE); > ### > As voom needs to be fed by raw counts and performs the cpm+log2 steps > internally, I am not sure about the properness of including in the linear > model the sva-computed surrogate variables from cpm-log2 values, and the > implications that this step may produce in the DE analysis. > Could you suggest an appropriate strategy so as to achieve my purposes? > Thanks a lot!!! > > Meritxell Oliva > PhD student > IBB (Biotechnology and Biomedicine Institute) > Comparative and Functional Genomics group > Campus Universitari - 08193 Bellaterra Cerdanyola del Vallès - Barcelona > > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 10.6 years ago Jeff Leek ▴ 650

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen just now

WEHI, Melbourne, Australia

> Date: Mon, 2 Sep 2013 19:35:07 +0200 > From: Meritxell Oliva <meritxellop at="" gmail.com=""> > To: bioconductor at r-project.org > Cc: Mario C?ceres <mcaceres at="" icrea.cat=""> > Subject: [BioC] sva + voom + limma > > Dear Bioconductor list, > Dear Jeff Leek & Gordon Smith, > I want to use sva() to estimate potential surrogate variables of a RNASeq derived expression dataset, as a previous step to perform differential gene expression analysis with limma(), previously using voom() to transform RNASeq to microarray-like expression data. > As far as I know, SVA was originally designed to deal with ( normally distibuted ) microarray expression data, but can also be used to work with RNSeq data. Please, correct me if I am wrong here! > So, I first transform the raw counts into cpm-log2 values, by using edgeR function calcNormFactors() and voom(). I apply sva() on the transformed dataset to compute the surrogate variables. Then, I build the design matrix to create a linear model with my primary variable of interest (InvGeno, a quantitative discrete variable: 0,1,2 ) and the set of surrogate variables, and I finally apply voom()+lmFit()+eBayes() to obtain DE candidates: > ### > y <- calcNormFactors(rawCounts_epression_dataset_matrix); > mod1 <- model.matrix(~InvGeno); > mod0 <- model.matrix(~1,data=InvGeno); > v <- voom(counts=y, design = mod1) > sva.obj <- sva(v$E, mod1, mod0,method="irw",n.sv=10); It's a pity that sva() can't use weights, but I can't suggest anything better. > mod1 <- model.matrix(~InvGeno+sva.obj$sv); > v <- voom(counts=y, design = mod1); > fit.obj <- lmFit(v, design); I think you mean design=mod1. > fit.obj <- eBayes(fit,trend=TRUE); trend=TRUE isn't needed. voom() already handles the trend. Gordon > ### > As voom needs to be fed by raw counts and performs the cpm+log2 steps internally, I am not sure about the properness of including in the linear model the sva-computed surrogate variables from cpm-log2 values, and the implications that this step may produce in the DE analysis. > Could you suggest an appropriate strategy so as to achieve my purposes? > Thanks a lot!!! > > Meritxell Oliva > PhD student > IBB (Biotechnology and Biomedicine Institute) > Comparative and Functional Genomics group > Campus Universitari - 08193 Bellaterra Cerdanyola del Vall?s - Barcelona > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}

ADD COMMENT • link 10.6 years ago Gordon Smyth 50k

Login before adding your answer.