Has anyone successfully used sva packages?

0

Entering edit mode

Fabrice Tourre ▴ 970

@fabrice-tourre-4394

Last seen 9.6 years ago

Dear list, Fellow the paper "Tackling the widespread and critical impact of batch effects in high-throughput data" and the code at http://rafalab.jhsph.edu/batch/, I used sva in my analysis. But I always get more than 90% of gene with p.adjust<0.05 (compared in two cell line). When I try limma package without sva, it give me about 3000 genes with p.adjust<0.05. It seems limma is reasonable. But it seems that limma can only remove tow batch each time using removeBatchEffect. removeBatchEffect(x,batch,batch2=NULL,design=matrix(1,ncol(x),1)) Does anyone have suggestions? library(preprocessCore) #cel=read.csv("celfies/cel_files.txt",header=F,skip=1) tab=read.table("output/ExpressionInput/groups.dis.txt",as.is=TRUE) mat<-read.table("gene/apt/rma-sketch.summary.txt",check.names = FALSE,header=T,row.names=1) colnames(mat)->c rownames(mat)->r merge(c,tab,by.y=1,by.x=1)->b tab<-b mat<-normalize.quantiles(as.matrix(mat)) colnames(mat)<-c rownames(mat)<-r dates=vector("character",ncol(mat)) for(i in seq(along=dates)){ tmp=affyio::read.celfile.header(file.path("celfies",tab[i,1]), info="full")$ScanDate dates[i]=strsplit(tmp,"T")[[1]][1] } dates=as.Date(dates) batch=as.numeric(dates-min(dates)) batch=as.numeric(cut(batch,c(unique(batch),1))) batch[is.na(batch)]=1 library(corpcor) library(affy) library("sva") library(limma) f<-factor(tab[,3]) mod = model.matrix(~f) mod0=model.matrix(~1,data=tab) colnames(mod)<-levels(f) n.sv = num.sv(mat,mod,method="leek") svobj = sva(mat,mod,mod0,n.sv=n.sv) modSv = cbind(mod,svobj$sv) mod0Sv = cbind(mod0,svobj$sv) pValuesSv = f.pvalue(mat,modSv,mod0Sv) qValuesSv = p.adjust(pValuesSv,method="BH") fit = lmFit(edata,modSv) contrast.matrix<- ???? How to construct this matrix? fitContrasts = contrasts.fit(fit,contrast.matrix) eb = eBayes(fitContrasts) topTableF(eb, adjust="BH") head(tab) x V2 V3 1 2A6_Hex_07Jul_09.CEL 1 dis 2 2A7_Hex_07Jul_09.CEL 1 dis 3 2A8_Hex_07Jul_09.CEL 1 dis 4 2A9_Hex_07Jul_09.CEL 1 norm 5 2AA_Hex_07Jul_09.CEL 1 norm 6 2AB_Hex_07Jul_09.CEL 1 norm

limma sva limma sva • 1.6k views

ADD COMMENT • link updated 12.4 years ago by jhua@tgen.org ▴ 60 • written 12.4 years ago by Fabrice Tourre ▴ 970

0

Entering edit mode

jhua@tgen.org ▴ 60

@jhuatgenorg-5011

Last seen 9.6 years ago

Depending on the cell lines you are using, two cell lines could be very different. Can you do some test on each batch (without any batch effect removal), and see how many differentially expressed genes? Also I assume you have similar proportion of two cell lines each every batch, i.e., fully random design. No batch effects removal is perfect, you should minimize the effects by randomization. Jianping > ------------------------------ > > Message: 10 > Date: Thu, 15 Dec 2011 15:53:47 +0800 > From: Fabrice Tourre <fabrice.ciup at="" gmail.com=""> > To: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] Has anyone successfully used sva packages? > Message-ID: > <can31xkcf+f0qxnof0hwqxlanvotpxhhc=g03jbhwdm2vjqp8nw at="" mail.gmail.com=""> > Content-Type: text/plain; charset="ISO-8859-1" > > Dear list, > > Fellow the paper "Tackling the widespread and critical impact of batch > effects in high-throughput data" and the code at > http://rafalab.jhsph.edu/batch/, I used sva in my analysis. But I > always get more than 90% of gene with p.adjust<0.05 (compared in two > cell line). When I try limma package without sva, it give me about > 3000 genes with p.adjust<0.05. It seems limma is reasonable. But it > seems that limma can only remove tow batch each time using > removeBatchEffect. > > removeBatchEffect(x,batch,batch2=NULL,design=matrix(1,ncol(x),1)) > > Does anyone have suggestions? > > library(preprocessCore) > #cel=read.csv("celfies/cel_files.txt",header=F,skip=1) > tab=read.table("output/ExpressionInput/groups.dis.txt",as.is=TRUE) > mat<-read.table("gene/apt/rma-sketch.summary.txt",check.names = > FALSE,header=T,row.names=1) > colnames(mat)->c > rownames(mat)->r > merge(c,tab,by.y=1,by.x=1)->b > tab<-b > mat<-normalize.quantiles(as.matrix(mat)) > colnames(mat)<-c > rownames(mat)<-r > dates=vector("character",ncol(mat)) > for(i in seq(along=dates)){ > tmp=affyio::read.celfile.header(file.path("celfies",tab[i,1]), info="full")$ScanDate > dates[i]=strsplit(tmp,"T")[[1]][1] > } > dates=as.Date(dates) > batch=as.numeric(dates-min(dates)) > batch=as.numeric(cut(batch,c(unique(batch),1))) > batch[is.na(batch)]=1 > > library(corpcor) > library(affy) > library("sva") > library(limma) > f<-factor(tab[,3]) > mod = model.matrix(~f) > mod0=model.matrix(~1,data=tab) > colnames(mod)<-levels(f) > n.sv = num.sv(mat,mod,method="leek") > svobj = sva(mat,mod,mod0,n.sv=n.sv) > modSv = cbind(mod,svobj$sv) > mod0Sv = cbind(mod0,svobj$sv) > pValuesSv = f.pvalue(mat,modSv,mod0Sv) > qValuesSv = p.adjust(pValuesSv,method="BH") > > fit = lmFit(edata,modSv) > contrast.matrix<- ???? How to construct this matrix? > fitContrasts = contrasts.fit(fit,contrast.matrix) > eb = eBayes(fitContrasts) > topTableF(eb, adjust="BH") > > > > > head(tab) > x V2 V3 > 1 2A6_Hex_07Jul_09.CEL 1 dis > 2 2A7_Hex_07Jul_09.CEL 1 dis > 3 2A8_Hex_07Jul_09.CEL 1 dis > 4 2A9_Hex_07Jul_09.CEL 1 norm > 5 2AA_Hex_07Jul_09.CEL 1 norm > 6 2AB_Hex_07Jul_09.CEL 1 norm > > > > ------------------------------ > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > End of Bioconductor Digest, Vol 106, Issue 14 > *********************************************

ADD COMMENT • link 12.4 years ago jhua@tgen.org ▴ 60

Login before adding your answer.