Surrogate Variable Analysis

0

Entering edit mode

Jerry Cholo ▴ 190

@jerry-cholo-6218

Last seen 9.5 years ago

Hello, I would like to remove the batch effects from a gene expression data using Surrogate Variable Analysis (SVA). When I looked at the SVA ( http://www.bioconductor.org/packages/release/bioc/html/sva.html) and "bladderbatch", I noticed that for 57 different samples, there are 5 different batches. May someone let me know how I could define these batches for my own data? In fact, my datasets include the normal, disease, two different tissues, and two different chip arrays? Thanks, Jerry [[alternative HTML version deleted]]

sva sva • 2.3k views

ADD COMMENT • link updated 10.1 years ago by Jeff Leek ▴ 650 • written 10.1 years ago by Jerry Cholo ▴ 190

1

Entering edit mode

Jeff Leek ▴ 650

@jeff-leek-5015

Last seen 3.1 years ago

United States

Hi Jerry, Batch information is often annotated in a data set. If it is not, one way to annotate batches is to identify what time each sample was run and then see if they cluster into distinct groups - which you could call batches. Finally, the surrogate variable analysis approach with the sva() function takes as input the data matrix (normalized) and the corresponding information about the primary variables you care about and attempts to recover the batches from the microarray data themselves. I hope that helps. Jeff On Mon, Mar 17, 2014 at 9:00 PM, Jerry Cholo <jerrycholo@gmail.com> wrote: > Hello, > > I would like to remove the batch effects from a gene expression data using > Surrogate Variable Analysis (SVA). When I looked at the SVA ( > http://www.bioconductor.org/packages/release/bioc/html/sva.html) and > "bladderbatch", I noticed that for 57 different samples, there are 5 > different batches. May someone let me know how I could define these > batches for my own data? In fact, my datasets include the normal, disease, > two different tissues, and two different chip arrays? > > Thanks, > > Jerry > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 10.1 years ago Jeff Leek ▴ 650

0

Entering edit mode

Hi Jerry, As Jeff mentioned, "Batch information is often annotated in a data set." You mentioned 5 batches, so it seems you know which batch each sample is from. In this case, the function 'removeBatchEffect' in limma package may be helpful. It is not intended to use with linear modelling. For linear modelling, it is better to include the batch factors in the linear model, for example in the following way when your level of batches is large (in your case it's 5, that is >3 ). dupcor <- duplicateCorrelation(data,design,block=batch ) dupcor$consensus.correlation fit <- lmFit( data,design, block=batch , correlation=dupcor$consensus) Hope this help. Di ---- Di Wu Postdoctoral fellow Harvard University, Statistics Department Harvard Medical School Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of Jeff Leek [jtleek@gmail.com] Sent: Wednesday, March 19, 2014 2:13 PM To: Jerry Cholo Cc: bioconductor at r-project.org Subject: Re: [BioC] Surrogate Variable Analysis Hi Jerry, Batch information is often annotated in a data set. If it is not, one way to annotate batches is to identify what time each sample was run and then see if they cluster into distinct groups - which you could call batches. Finally, the surrogate variable analysis approach with the sva() function takes as input the data matrix (normalized) and the corresponding information about the primary variables you care about and attempts to recover the batches from the microarray data themselves. I hope that helps. Jeff On Mon, Mar 17, 2014 at 9:00 PM, Jerry Cholo <jerrycholo at="" gmail.com=""> wrote: > Hello, > > I would like to remove the batch effects from a gene expression data using > Surrogate Variable Analysis (SVA). When I looked at the SVA ( > http://www.bioconductor.org/packages/release/bioc/html/sva.html) and > "bladderbatch", I noticed that for 57 different samples, there are 5 > different batches. May someone let me know how I could define these > batches for my own data? In fact, my datasets include the normal, disease, > two different tissues, and two different chip arrays? > > Thanks, > > Jerry > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 10.1 years ago Wu, Di ▴ 120

Login before adding your answer.