TCGA_HumanMethylation27_upload
1
0
Entering edit mode
@magdalena-wozniak-5004
Last seen 9.6 years ago
Hi, I am trying to upload TCGA level1 methylation data from 27k. I've tried to follow instructions for Glioblastoma example as well but I had a problem with installation of TCGAGBM package so I was not able see the format of the files of the example data. I've tried different versions of the following script but at some point (I think from the point I indicated) it does not work properly. I would be very grateful for help. Best wishes, Magdalena homedir="D:/TCGA Methylation Analysis/DATA/KIRC/LEVEL 1/Pairs_HumanMethylation27" dirs=list.files(homedir) batchNames=gsub("jhu-usc.edu_KIRC.HumanMethylation27.","",dirs) tmp=sapply(strsplit(batchNames,"\\."),function(x) as.numeric(x[1:2])) files=vector("list",length(dirs)) ngenes=27578 for(i in seq(along=dirs)){ tmpPath=file.path(homedir,dirs[i]) fns=list.files(tmpPath) files[[i]]=fns[grep("lvl-1",fns)] } nfiles=sapply(files,length) starts=c(0,cumsum(nfiles[-length(nfiles)])) * ### FROM HERE SOMETHING DOESN'T WORK* # b = tmp$Methylated_Signal_Intensity..M./ (tmp$Methylated_Signal_Intensity..M. + tmp$Un.Methylated_Signal_Intensity..U.+100) # betas=matrix(b,nrow=ngenes,ncol=sum(nfiles)) betas=matrix(NA,nrow=ngenes,ncol=sum(nfiles)) cnames=vector("character",sum(nfiles)) batch=factor(rep(seq(along=files),nfiles)) for(i in seq(along=dirs)){ cat(i) tmpPath=file.path(homedir,dirs[i]) fns=files[[i]] for(j in seq(along=fns)){ cat(".") ###Get the sample name tmp=strsplit(readLines(file.path(tmpPath,fns[j]),n=1),"\t") cnames[ j+starts[i] ]=tmp[[1]][2] ##Get the Beta values tmp=read.delim(file.path(tmpPath,fns[j]),skip=1,check.names=FALSE,as.i s =TRUE) b= betaIndex=grep("[bB]eta",names(tmp)) ##the column name change!! betas[,j+starts[i]]=tmp[,betaIndex] ###Check "gene" names in same order if(i==1 & j==1) gnames=tmp[,"CompositeElement REF"] else if(!identical(gnames,tmp[,"CompositeElement REF"])) stop("Genes not in order") } cat("\n") } colnames(betas)=cnames rownames(betas)=gnames save(betas,batch,batchNames,file="betas.rda") [[alternative HTML version deleted]]
• 947 views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
Hi, Magdalena. You may want to take a look at using the methylumi package rather than your script below. Sean On Tue, Dec 20, 2011 at 12:25 PM, Magdalena Wozniak < wozniak.magdalena@gmail.com> wrote: > Hi, > > I am trying to upload TCGA level1 methylation data from 27k. I've tried to > follow instructions for Glioblastoma example as well but I had a problem > with installation of TCGAGBM package so I was not able see the format of > the files of the example data. I've tried different versions of the > following script but at some point (I think from the point I indicated) it > does not work properly. I would be very grateful for help. > > Best wishes, > Magdalena > > > homedir="D:/TCGA Methylation Analysis/DATA/KIRC/LEVEL > 1/Pairs_HumanMethylation27" > dirs=list.files(homedir) > batchNames=gsub("jhu-usc.edu_KIRC.HumanMethylation27.","",dirs) > tmp=sapply(strsplit(batchNames,"\\."),function(x) as.numeric(x[1:2])) > > files=vector("list",length(dirs)) > ngenes=27578 > for(i in seq(along=dirs)){ > tmpPath=file.path(homedir,dirs[i]) > fns=list.files(tmpPath) > files[[i]]=fns[grep("lvl-1",fns)] > } > > nfiles=sapply(files,length) > starts=c(0,cumsum(nfiles[-length(nfiles)])) > * > ### FROM HERE SOMETHING DOESN'T WORK* > # b = tmp$Methylated_Signal_Intensity..M./ > (tmp$Methylated_Signal_Intensity..M. > + tmp$Un.Methylated_Signal_Intensity..U.+100) > # betas=matrix(b,nrow=ngenes,ncol=sum(nfiles)) > betas=matrix(NA,nrow=ngenes,ncol=sum(nfiles)) > cnames=vector("character",sum(nfiles)) > batch=factor(rep(seq(along=files),nfiles)) > for(i in seq(along=dirs)){ > cat(i) > tmpPath=file.path(homedir,dirs[i]) > fns=files[[i]] > for(j in seq(along=fns)){ > cat(".") > > ###Get the sample name > tmp=strsplit(readLines(file.path(tmpPath,fns[j]),n=1),"\t") > cnames[ j+starts[i] ]=tmp[[1]][2] > > ##Get the Beta values > tmp=read.delim(file.path(tmpPath,fns[j]),skip=1,check.names=FALSE,as .is > =TRUE) > b= > betaIndex=grep("[bB]eta",names(tmp)) ##the column name change!! > betas[,j+starts[i]]=tmp[,betaIndex] > > ###Check "gene" names in same order > if(i==1 & j==1) gnames=tmp[,"CompositeElement REF"] else > if(!identical(gnames,tmp[,"CompositeElement REF"])) stop("Genes not in > order") > } > cat("\n") > } > colnames(betas)=cnames > rownames(betas)=gnames > > save(betas,batch,batchNames,file="betas.rda") > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 1066 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6