Cannot read in CEL files with XPS
1
0
Entering edit mode
@christopher-n-barnes-3179
Last seen 9.8 years ago
All, I am new to xps and am having trouble reading in the cel files. I got the 3 correct files from affymetrix and created a scheme removing the first 12 lines from the annotation file (fix 1) I then read in my scheme: hgu133plus2<-root.scheme(paste(.path.package("xps"),"schemes/hgu133plu s2.root", sep="/")) and then try to read in the CEL files. celdir2<-"C:/McMasters/test" data.test3<-import.data(hgu133plus2,"tmp2",celdir=celdir2, verbose=FALSE) It worked 1 time and now causes R to crash. I am trying to read in 40 CEL files 50,000+ genes on a 4G machine. Does anyone have any suggestions of another method to read a large amount of CEL files. If I try using Read Affy() to read in, I don't have the space to allocate. Thanks for the Help, Chris Barnes PhD student University of Louisville [[alternative HTML version deleted]]
Annotation xps Annotation xps • 991 views
ADD COMMENT
0
Entering edit mode
cstrato ★ 3.9k
@cstrato-908
Last seen 5.8 years ago
Austria
Dear Chris This is strange, could you please give your sessionInfo(), which version of xps, which version of ROOT, which version of R, WinXP or Vista? Could you please give the complete code for creating the scheme. I am not sure if it is a good idea to save the "hgu133plu2.root" file in the package directory, I would propose to create a directory "schemes" somewhere else, e.g. "McMasters/schemes". Furthermore, could you please set "verbose=TRUE" in the methods and start R from the Command Console. Then you will see the progress messages. Could you please send me this output, so that I can check the result? Handling 40 CEL-files should not be a problem, one user of xps reported that he could successfully handle 500 CEL-files on his Windows machine. Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ Christopher N Barnes wrote: > All, > > I am new to xps and am having trouble reading in the cel files. > > I got the 3 correct files from affymetrix and created a scheme removing the first 12 lines from the annotation file (fix 1) > > > I then read in my scheme: > hgu133plus2<-root.scheme(paste(.path.package("xps"),"schemes/hgu133p lus2.root", > sep="/")) > > and then try to read in the CEL files. > celdir2<-"C:/McMasters/test" > data.test3<-import.data(hgu133plus2,"tmp2",celdir=celdir2, verbose=FALSE) > It worked 1 time and now causes R to crash. I am trying to read in 40 CEL files 50,000+ genes on a 4G machine. > > Does anyone have any suggestions of another method to read a large amount of CEL files. If I try using Read Affy() to read in, I don't have the space to allocate. > > Thanks for the Help, > > Chris Barnes > PhD student > University of Louisville > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > >
ADD COMMENT
0
Entering edit mode
Dear Chris, Maybe the following information can help you solve your problems: This is my setup: A dual-boot MacBook Pro, 2GB RAM, running Windows XP SP2 where I have installed the following binary versions: - R-2.8.0-win32.exe - root_v5.18.00.win32.vc80.msi - xps_1.2.1.zip Note that root_v5.18.00 is necessary since Bioconductor has compiled xps with this version. You can run xps either from RGui or from Rterm: When using RGui you should set "verbose=FALSE" in all functions, since you will not see any messages anyhow. I would recommend using Rterm with "verbose=TRUE", at least initially to get a feeling what xps does, see the examples below. 1. Import schemes: Since xps uses the original Affymetrix CDF, PGF and annotation files, you have to import these files first. Here is my Rterm session for doing this for HG-U133_Plus_2: > library(xps) Welcome to xps version 1.2.1 an R wrapper for XPS - eXpression Profiling System (c) Copyright 2001-2008 by Christian Stratowa > libdir <- "C:/home/Affy/libraryfiles" > anndir <- "C:/home/Affy/Annotation" > scmdir <- "C:/home/Rabbitus/CRAN/Workspaces/Schemes" > scheme.hgu133p2.na27 <- import.expr.scheme("Scheme_HGU133p2_na27",filedir=scmdir,paste(libdir ,"HG-U133_Plus_2.cdf",sep="/"),paste(libdir,"HG-U133-PLUS_probe.tab",s ep="/"),paste(anndir,"Version08Nov/HG- U133_Plus_2.na27.annot.csv",sep="/")) Creating new file <c: home="" rabbitus="" cran="" workspaces="" schemes="" scheme_hgu133p2_na27.root="">.. . Importing <c: home="" affy="" libraryfiles="" hg-u133_plus_2.cdf=""> as <hg-u133_plus_2.scm>... <1354896> records imported...Finished PM/MM statistics: 5 cells with minimum number of PM/MM pairs: 8 1 cells with maximum number of PM/MM pairs: 69 New dataset <hg-u133_plus_2> is added to Content... Importing <c: home="" affy="" libraryfiles="" hg-u133-plus_probe.tab=""> as <hg-u133_plus_2.prb>... Warning: The following header columns are missing: <serial order=""> <604258> records read...Finished <1354896> records imported...Finished probe info: GC content: minimum GC is <3> maximum GC is <22> Melting Tm: minimum Tm is <51> maximum Tm is <89> Importing <c: home="" affy="" annotation="" version08nov="" hg-u133_plus_2.na27.annot.csv=""> as <hg-u133_plus_2.ann>... Warning: The following header columns are missing: <protein families=""> <protein domains=""> Number of annotated transcripts is <54675>. Warning: Number of transcripts with ambigous annotation is <336> <54675> records imported...Finished > I would recommend to import all necessary schemes and save them in a common system directory. You need not save this R session since you can access every scheme in later R sessions with function root.scheme(). Note that with xps_1.2.1 it is no longer necessary to delete the first 12 lines from the annotation file. All warnings can be ignored, they are caused by changes in the Affymetrix annotation files. 2. Import CEL-files: To show you that xps can easily handle many CEL-files I have imported all 53 CEl-files from the Affymetrix human tissue/mix dataset. Here is the output for RGui: > library(xps) Welcome to xps version 1.2.1 an R wrapper for XPS - eXpression Profiling System (c) Copyright 2001-2008 by Christian Stratowa > scmdir <- "E:/CRAN/Workspaces/Schemes" > scmdir <- "E:/CRAN/Workspaces/Schemes" > celdir <- "E:/ChipData/Exon/HuMixture" > datdir <- "E:/CRAN/Workspaces/ROOTData" > scheme.u133p2 <- root.scheme(paste(scmdir,"Scheme_HGU133p2_na27.root",sep="/")) > Sys.time() [1] "2008-12-07 14:47:20 CET" > data.mix <- import.data(scheme.u133p2, "HuMixAllU133P2", filedir=datdir, celdir=celdir, verbose=FALSE) > Sys.time() [1] "2008-12-07 14:53:45 CET" > As you see, importing 53 CEL-files takes about 7 min. Here is the (partial) output when using Rterm: > library(xps) Welcome to xps version 1.2.1 an R wrapper for XPS - eXpression Profiling System (c) Copyright 2001-2008 by Christian Stratowa > scmdir <- "E:/CRAN/Workspaces/Schemes" > celdir <- "E:/ChipData/Exon/HuMixture" > datdir <- "E:/CRAN/Workspaces/ROOTData" > scheme.u133p2 <- root.scheme(paste(scmdir,"Scheme_HGU133p2_na27.root",sep="/")) > data.mix <- import.data(scheme.u133p2, "HuMixAllU133P2", filedir=datdir, celdir=celdir, verbose=TRUE) Opening file <e: cran="" workspaces="" schemes="" scheme_hgu133p2_na27.root=""> in <read> mode... Creating new file <e: cran="" workspaces="" rootdata="" hutissuesu133p2_cel.root="">... Importing <e: chipdata="" exon="" humixture="" u1332plus_ivt_breast_a.cel=""> as <u1332plus_ivt_breast_a.cel>... <1354896> records imported... hybridization statistics: 4 cells with minimal intensity 32 1 cells with maximal intensity 16261 New dataset <dataset> is added to Content... Importing <e: chipdata="" exon="" humixture="" u1332plus_ivt_breast_b.cel=""> as <u1332plus_ivt_breast_b.cel>... <1354896> records imported... hybridization statistics: 1 cells with minimal intensity 24 1 cells with maximal intensity 20496 ... ... Importing <e: chipdata="" exon="" humixture="" u1332plus_ivt_thyroid_b.cel=""> as <u1332plus_ivt_thyroid_b.cel>... <1354896> records imported... hybridization statistics: 1 cells with minimal intensity 29 1 cells with maximal intensity 47017 Importing <e: chipdata="" exon="" humixture="" u1332plus_ivt_thyroid_c.cel=""> as <u1332plus_ivt_thyroid_c.cel>... <1354896> records imported... hybridization statistics: 1 cells with minimal intensity 24 2 cells with maximal intensity 65534 > As you see, in Rterm you see the progress status and get some statistical information. Since CEL-files have often long and strange names I would recommend to use parameter "celnames" in function import.data() to use new names. Once again you need not save the R session since you can access the data in later R sessions using function root.data(). 3. RMA normalization: RMA normalization of all 53 CEL-files takes about 1 hr. Here is the RGui session: > library(xps) Welcome to xps version 1.2.1 an R wrapper for XPS - eXpression Profiling System (c) Copyright 2001-2008 by Christian Stratowa > scmdir <- "E:/CRAN/Workspaces/Schemes" > scheme.u133p2 <- root.scheme(paste(scmdir,"Scheme_HGU133p2_na27.root",sep="/")) > datdir <- "E:/CRAN/Workspaces/ROOTData" > data.u133p2 <- root.data(scheme.u133p2, paste(datdir,"HuMixAllU133P2_cel.root",sep="/")) > Sys.time() [1] "2008-12-07 14:59:12 CET" > data.rma <- rma(data.u133p2,"MixAllU133P2RMA",tmpdir="",background="pmonly",normal ize=TRUE,verbose=FALSE) > Sys.time() [1] "2008-12-07 15:55:25 CET" > In comparison, here is the (partial) Rterm session: > library(xps) Welcome to xps version 1.2.1 an R wrapper for XPS - eXpression Profiling System (c) Copyright 2001-2008 by Christian Stratowa > scmdir <- "E:/CRAN/Workspaces/Schemes" > scheme.u133p2 <- root.scheme(paste(scmdir,"Scheme_HGU133p2_na27.root",sep="/")) > datdir <- "E:/CRAN/Workspaces/ROOTData" > data.u133p2 <- root.data(scheme.u133p2, paste(datdir,"HuMixAllU133P2_cel.root",sep="/")) > Sys.time() [1] "2008-12-07 13:32:35 CET" > data.rma <- rma(data.u133p2,"MixAllU133P2RMA",tmpdir="",background="pmonly",normal ize=TRUE,verbose=TRUE) Creating new file <e: cran="" workspaces="" exon="" hutissues="" u133p2="" mixallu133p2rma.root="">... Opening file <e: cran="" workspaces="" schemes="" scheme_hgu133p2_na27.root=""> in <read> mode... Opening file <e: cran="" workspaces="" rootdata="" humixallu133p2_cel.root=""> in <read> mode... Preprocessing data using method <preprocess>... Background correcting raw data... calculating background for <u1332plus_ivt_breast_a.cel>... background statistics: 750638 cells with minimal intensity 0 1468 cells with maximal intensity 69.3196 calculating background for <u1332plus_ivt_breast_b.cel>... background statistics: 750638 cells with minimal intensity 0 1334 cells with maximal intensity 68.3009 ... ... calculating background for <u1332plus_ivt_thyroid_b.cel>... background statistics: 750638 cells with minimal intensity 0 295 cells with maximal intensity 65.6557 calculating background for <u1332plus_ivt_thyroid_c.cel>... background statistics: 750638 cells with minimal intensity 0 1 cells with maximal intensity 74.3142 Normalizing raw data... normalizing data using method <quantile>... finished filling <53> arrays. .. finished filling <53> trees. cqu>... Converting raw data to expression levels... summarizing with <medianpolish>... calculating expression for <54675> of <54684> units...Finished. expression statistics: minimal expression level is <2.65147> maximal expression level is <15470.9> preprocessing finished. Opening file <e: cran="" workspaces="" schemes="" scheme_hgu133p2_na27.root=""> in <read> mode... Opening file <e: cran="" workspaces="" exon="" hutissues="" u133p2="" mixallu133p2rma.root=""> in <read> mode... Exporting data from tree <*> to file <e: cran="" workspaces="" exon="" hutissues="" u133p2="" mixallu133p2rma.txt="">... Reading entries from <hg-u133_plus_2.ann> ...Finished <54675> of <54675> records exported. > Sys.time() [1] "2008-12-07 14:35:09 CET" > Once again, in Rterm you see the progress status and get some statistical information. I consider it helpful to see the progress information, especially when computation takes a long time. I hope that this demonstration could show you how to use xps successfully, and can help you solving your problems. Best regards Christian cstrato wrote: > Dear Chris > > This is strange, could you please give your sessionInfo(), which > version of xps, which version of ROOT, which version of R, WinXP or > Vista? > > Could you please give the complete code for creating the scheme. > I am not sure if it is a good idea to save the "hgu133plu2.root" file > in the package directory, I would propose to create a directory > "schemes" somewhere else, e.g. "McMasters/schemes". > > Furthermore, could you please set "verbose=TRUE" in the methods and > start R from the Command Console. Then you will see the progress > messages. Could you please send me this output, so that I can check > the result? > > Handling 40 CEL-files should not be a problem, one user of xps > reported that he could successfully handle 500 CEL-files on his > Windows machine. > > Best regards > Christian > _._._._._._._._._._._._._._._._._._ > C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a > V.i.e.n.n.a A.u.s.t.r.i.a > e.m.a.i.l: cstrato at aon.at > _._._._._._._._._._._._._._._._._._ > > > > Christopher N Barnes wrote: >> All, >> >> I am new to xps and am having trouble reading in the cel files. >> >> I got the 3 correct files from affymetrix and created a scheme >> removing the first 12 lines from the annotation file (fix 1) >> >> >> I then read in my scheme: >> hgu133plus2<-root.scheme(paste(.path.package("xps"),"schemes/hgu133 plus2.root", >> >> sep="/")) >> >> and then try to read in the CEL files. >> celdir2<-"C:/McMasters/test" >> data.test3<-import.data(hgu133plus2,"tmp2",celdir=celdir2, >> verbose=FALSE) >> It worked 1 time and now causes R to crash. I am trying to read in >> 40 CEL files 50,000+ genes on a 4G machine. >> >> Does anyone have any suggestions of another method to read a large >> amount of CEL files. If I try using Read Affy() to read in, I don't >> have the space to allocate. >> Thanks for the Help, >> >> Chris Barnes >> PhD student University of Louisville >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY

Login before adding your answer.

Traffic: 1086 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6