What should be the output after processing a cel file?

0

Entering edit mode

Peng Yu ▴ 940

@peng-yu-3586

Last seen 9.6 years ago

Hi, I run the following command in R. library(oligo) data<-oligo::read.celfiles("some.cel") eset<-rma(data) write.exprs(eset, file="some.txt", sep="\t") It generate the file "some.txt". But I am not sure what it means. The content of some.txt is the following. wt1-mth_HZ_5238_MST1_19385.cel 10344615 7.83088386872146 10344617 3.13300493228193 10344619 3.00984893419684 10344621 4.55830890064195 10344623 7.79420011157519 10344625 8.93864799064523 10344626 10.2404135279143 10344627 8.36493644804453 10344628 10.8239110733786 I am wondering if I processed the cel file correctly. What does the first column mean? Regards, Peng

• 871 views

ADD COMMENT • link updated 14.8 years ago by James W. MacDonald 65k • written 14.8 years ago by Peng Yu ▴ 940

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 day ago

United States

Hi Peng, Peng Yu wrote: > Hi, > > I run the following command in R. > > library(oligo) > data<-oligo::read.celfiles("some.cel") > eset<-rma(data) > write.exprs(eset, file="some.txt", sep="\t") > > It generate the file "some.txt". But I am not sure what it means. The > content of some.txt is the following. > > wt1-mth_HZ_5238_MST1_19385.cel > 10344615 7.83088386872146 > 10344617 3.13300493228193 > 10344619 3.00984893419684 > 10344621 4.55830890064195 > 10344623 7.79420011157519 > 10344625 8.93864799064523 > 10344626 10.2404135279143 > 10344627 8.36493644804453 > 10344628 10.8239110733786 > > > I am wondering if I processed the cel file correctly. What does the > first column mean? The first column is the affy probeset ID. You can use the correct annotation package to map these IDs to more conventional IDs, such as Entrez Gene or Ensembl using the correct .db package. Had you noted what chip this is, I might have been able to point you to the correct chip. But you can peruse this webpage to find it: http://www.bioconductor.org/packages/release/data/annotation/ Best, Jim > > Regards, > Peng > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826

ADD COMMENT • link 14.8 years ago James W. MacDonald 65k

0

Entering edit mode

On Fri, Jul 24, 2009 at 12:43 PM, James W. MacDonald<jmacdon at="" med.umich.edu=""> wrote: > Hi Peng, > > Peng Yu wrote: >> >> Hi, >> >> I run the following command in R. >> >> library(oligo) >> data<-oligo::read.celfiles("some.cel") >> eset<-rma(data) >> write.exprs(eset, file="some.txt", sep="\t") >> >> It generate the file "some.txt". But I am not sure what it means. The >> content of some.txt is the following. >> >> ? ? ? ?wt1-mth_HZ_5238_MST1_19385.cel >> 10344615 ? ? ? ?7.83088386872146 >> 10344617 ? ? ? ?3.13300493228193 >> 10344619 ? ? ? ?3.00984893419684 >> 10344621 ? ? ? ?4.55830890064195 >> 10344623 ? ? ? ?7.79420011157519 >> 10344625 ? ? ? ?8.93864799064523 >> 10344626 ? ? ? ?10.2404135279143 >> 10344627 ? ? ? ?8.36493644804453 >> 10344628 ? ? ? ?10.8239110733786 >> >> >> I am wondering if I processed the cel file correctly. What does the >> first column mean? > > The first column is the affy probeset ID. You can use the correct annotation > package to map these IDs to more conventional IDs, such as Entrez Gene or > Ensembl using the correct .db package. Had you noted what chip this is, I > might have been able to point you to the correct chip. But you can peruse > this webpage to find it: > > http://www.bioconductor.org/packages/release/data/annotation/ Hi Jim, It's the Mouse Gene 1.0 ST Array. http://www.affymetrix.com/products_services/arrays/specific/mousegene_ 1_st.affx The following is the output of my R script. Shall I use 'mogene10stprobeset.db' and 'mogene10sttranscriptcluster.db'? Regards, Peng > library(oligo) Loading required package: oligoClasses Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. Loading required package: preprocessCore Welcome to oligo version 1.8.1 > > for (f in c("wt1-mth_HZ_5238_MST1_19385", + "wt2-mth_HZ_5238_MST1_19386", + "wt3-mth_HZ_5238_MST1_19387", + "wt4-mth_HZ_5238_MST1_19388", + "koA-mth_HZ_5238_MST1_19389", + "koB-mth_HZ_5238_MST1_19390", + "koC-mth_HZ_5238_MST1_19391", + "koD-mth_HZ_5238_MST1_19392" + )) { + data<-oligo::read.celfiles(paste(f, ".cel", sep='')) + eset<-rma(data) + write.exprs(eset, file=paste(f, ".txt", sep=''), sep="\t") + } Loading required package: pd.mogene.1.0.st.v1 Loading required package: RSQLite Loading required package: DBI Platform design info loaded. Reading in : wt1-mth_HZ_5238_MST1_19385.cel Background correcting Normalizing Calculating Expression Platform design info loaded. Reading in : wt2-mth_HZ_5238_MST1_19386.cel Background correcting Normalizing Calculating Expression Platform design info loaded. Reading in : wt3-mth_HZ_5238_MST1_19387.cel Background correcting Normalizing Calculating Expression Platform design info loaded. Reading in : wt4-mth_HZ_5238_MST1_19388.cel Background correcting Normalizing Calculating Expression Platform design info loaded. Reading in : koA-mth_HZ_5238_MST1_19389.cel Background correcting Normalizing Calculating Expression Platform design info loaded. Reading in : koB-mth_HZ_5238_MST1_19390.cel Background correcting Normalizing Calculating Expression Platform design info loaded. Reading in : koC-mth_HZ_5238_MST1_19391.cel Background correcting Normalizing Calculating Expression Platform design info loaded. Reading in : koD-mth_HZ_5238_MST1_19392.cel Background correcting Normalizing Calculating Expression > > proc.time() user system elapsed 574.095 14.989 595.596

ADD REPLY • link 14.8 years ago Peng Yu ▴ 940

0

Entering edit mode

Hi Peng, Based on the probe IDs you have shown us here, you want the 'mogene10stprobeset.db' package for annotations. Affymetrix did some changes mid stream about what they were going to call an ID for that platform and so we have to accommodate people both before and after said changes. This is why there are two variants of annotation packages for that platform. Marc Peng Yu wrote: > On Fri, Jul 24, 2009 at 12:43 PM, James W. > MacDonald<jmacdon at="" med.umich.edu=""> wrote: > >> Hi Peng, >> >> Peng Yu wrote: >> >>> Hi, >>> >>> I run the following command in R. >>> >>> library(oligo) >>> data<-oligo::read.celfiles("some.cel") >>> eset<-rma(data) >>> write.exprs(eset, file="some.txt", sep="\t") >>> >>> It generate the file "some.txt". But I am not sure what it means. The >>> content of some.txt is the following. >>> >>> wt1-mth_HZ_5238_MST1_19385.cel >>> 10344615 7.83088386872146 >>> 10344617 3.13300493228193 >>> 10344619 3.00984893419684 >>> 10344621 4.55830890064195 >>> 10344623 7.79420011157519 >>> 10344625 8.93864799064523 >>> 10344626 10.2404135279143 >>> 10344627 8.36493644804453 >>> 10344628 10.8239110733786 >>> >>> >>> I am wondering if I processed the cel file correctly. What does the >>> first column mean? >>> >> The first column is the affy probeset ID. You can use the correct annotation >> package to map these IDs to more conventional IDs, such as Entrez Gene or >> Ensembl using the correct .db package. Had you noted what chip this is, I >> might have been able to point you to the correct chip. But you can peruse >> this webpage to find it: >> >> http://www.bioconductor.org/packages/release/data/annotation/ >> > > Hi Jim, > > It's the Mouse Gene 1.0 ST Array. > > http://www.affymetrix.com/products_services/arrays/specific/mousegen e_1_st.affx > > The following is the output of my R script. Shall I use > 'mogene10stprobeset.db' and 'mogene10sttranscriptcluster.db'? > > Regards, > Peng > > >> library(oligo) >> > Loading required package: oligoClasses > Loading required package: Biobase > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > Loading required package: preprocessCore > Welcome to oligo version 1.8.1 > >> for (f in c("wt1-mth_HZ_5238_MST1_19385", >> > + "wt2-mth_HZ_5238_MST1_19386", > + "wt3-mth_HZ_5238_MST1_19387", > + "wt4-mth_HZ_5238_MST1_19388", > + "koA-mth_HZ_5238_MST1_19389", > + "koB-mth_HZ_5238_MST1_19390", > + "koC-mth_HZ_5238_MST1_19391", > + "koD-mth_HZ_5238_MST1_19392" > + )) { > + data<-oligo::read.celfiles(paste(f, ".cel", sep='')) > + eset<-rma(data) > + write.exprs(eset, file=paste(f, ".txt", sep=''), sep="\t") > + } > Loading required package: pd.mogene.1.0.st.v1 > Loading required package: RSQLite > Loading required package: DBI > Platform design info loaded. > Reading in : wt1-mth_HZ_5238_MST1_19385.cel > Background correcting > Normalizing > Calculating Expression > Platform design info loaded. > Reading in : wt2-mth_HZ_5238_MST1_19386.cel > Background correcting > Normalizing > Calculating Expression > Platform design info loaded. > Reading in : wt3-mth_HZ_5238_MST1_19387.cel > Background correcting > Normalizing > Calculating Expression > Platform design info loaded. > Reading in : wt4-mth_HZ_5238_MST1_19388.cel > Background correcting > Normalizing > Calculating Expression > Platform design info loaded. > Reading in : koA-mth_HZ_5238_MST1_19389.cel > Background correcting > Normalizing > Calculating Expression > Platform design info loaded. > Reading in : koB-mth_HZ_5238_MST1_19390.cel > Background correcting > Normalizing > Calculating Expression > Platform design info loaded. > Reading in : koC-mth_HZ_5238_MST1_19391.cel > Background correcting > Normalizing > Calculating Expression > Platform design info loaded. > Reading in : koD-mth_HZ_5238_MST1_19392.cel > Background correcting > Normalizing > Calculating Expression > >> proc.time() >> > user system elapsed > 574.095 14.989 595.596 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 14.8 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Hi Peng, Peng Yu wrote: > On Fri, Jul 24, 2009 at 12:43 PM, James W. > MacDonald<jmacdon at="" med.umich.edu=""> wrote: >> Hi Peng, >> >> Peng Yu wrote: >>> Hi, >>> >>> I run the following command in R. >>> >>> library(oligo) >>> data<-oligo::read.celfiles("some.cel") >>> eset<-rma(data) >>> write.exprs(eset, file="some.txt", sep="\t") >>> >>> It generate the file "some.txt". But I am not sure what it means. The >>> content of some.txt is the following. >>> >>> wt1-mth_HZ_5238_MST1_19385.cel >>> 10344615 7.83088386872146 >>> 10344617 3.13300493228193 >>> 10344619 3.00984893419684 >>> 10344621 4.55830890064195 >>> 10344623 7.79420011157519 >>> 10344625 8.93864799064523 >>> 10344626 10.2404135279143 >>> 10344627 8.36493644804453 >>> 10344628 10.8239110733786 >>> >>> >>> I am wondering if I processed the cel file correctly. What does the >>> first column mean? >> The first column is the affy probeset ID. You can use the correct annotation >> package to map these IDs to more conventional IDs, such as Entrez Gene or >> Ensembl using the correct .db package. Had you noted what chip this is, I >> might have been able to point you to the correct chip. But you can peruse >> this webpage to find it: >> >> http://www.bioconductor.org/packages/release/data/annotation/ > > Hi Jim, > > It's the Mouse Gene 1.0 ST Array. > > http://www.affymetrix.com/products_services/arrays/specific/mousegen e_1_st.affx > > The following is the output of my R script. Shall I use > 'mogene10stprobeset.db' and 'mogene10sttranscriptcluster.db'? > > Regards, > Peng > >> library(oligo) > Loading required package: oligoClasses > Loading required package: Biobase > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > Loading required package: preprocessCore > Welcome to oligo version 1.8.1 >> for (f in c("wt1-mth_HZ_5238_MST1_19385", > + "wt2-mth_HZ_5238_MST1_19386", > + "wt3-mth_HZ_5238_MST1_19387", > + "wt4-mth_HZ_5238_MST1_19388", > + "koA-mth_HZ_5238_MST1_19389", > + "koB-mth_HZ_5238_MST1_19390", > + "koC-mth_HZ_5238_MST1_19391", > + "koD-mth_HZ_5238_MST1_19392" > + )) { > + data<-oligo::read.celfiles(paste(f, ".cel", sep='')) > + eset<-rma(data) > + write.exprs(eset, file=paste(f, ".txt", sep=''), sep="\t") > + } OK. Seriously. Don't do this. If you got this idea somewhere, please let us know where so we can correct that information. The rma method is designed to work with a set of chips, not one by one. You want to do something like this: dat <- read.celfiles(list.celfiles()) eset <- rma(dat) now use something like limma to find differentially expressed genes. Then if you want to annotate them, you can use the mogene10stprobeset.db package. You might seriously consider purchasing this: http://www.bioconductor.org/pub/docs/mogr/ or finding a local statistician who is familiar with these tools to help you. Best, Jim > Loading required package: pd.mogene.1.0.st.v1 > Loading required package: RSQLite > Loading required package: DBI > Platform design info loaded. > Reading in : wt1-mth_HZ_5238_MST1_19385.cel > Background correcting > Normalizing > Calculating Expression > Platform design info loaded. > Reading in : wt2-mth_HZ_5238_MST1_19386.cel > Background correcting > Normalizing > Calculating Expression > Platform design info loaded. > Reading in : wt3-mth_HZ_5238_MST1_19387.cel > Background correcting > Normalizing > Calculating Expression > Platform design info loaded. > Reading in : wt4-mth_HZ_5238_MST1_19388.cel > Background correcting > Normalizing > Calculating Expression > Platform design info loaded. > Reading in : koA-mth_HZ_5238_MST1_19389.cel > Background correcting > Normalizing > Calculating Expression > Platform design info loaded. > Reading in : koB-mth_HZ_5238_MST1_19390.cel > Background correcting > Normalizing > Calculating Expression > Platform design info loaded. > Reading in : koC-mth_HZ_5238_MST1_19391.cel > Background correcting > Normalizing > Calculating Expression > Platform design info loaded. > Reading in : koD-mth_HZ_5238_MST1_19392.cel > Background correcting > Normalizing > Calculating Expression >> proc.time() > user system elapsed > 574.095 14.989 595.596 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826

ADD REPLY • link 14.8 years ago James W. MacDonald 65k

0

Entering edit mode

On Fri, Jul 24, 2009 at 1:13 PM, James W. MacDonald<jmacdon at="" med.umich.edu=""> wrote: > Hi Peng, > > Peng Yu wrote: >> >> On Fri, Jul 24, 2009 at 12:43 PM, James W. >> MacDonald<jmacdon at="" med.umich.edu=""> wrote: >>> >>> Hi Peng, >>> >>> Peng Yu wrote: >>>> >>>> Hi, >>>> >>>> I run the following command in R. >>>> >>>> library(oligo) >>>> data<-oligo::read.celfiles("some.cel") >>>> eset<-rma(data) >>>> write.exprs(eset, file="some.txt", sep="\t") >>>> >>>> It generate the file "some.txt". But I am not sure what it means. The >>>> content of some.txt is the following. >>>> >>>> ? ? ? wt1-mth_HZ_5238_MST1_19385.cel >>>> 10344615 ? ? ? ?7.83088386872146 >>>> 10344617 ? ? ? ?3.13300493228193 >>>> 10344619 ? ? ? ?3.00984893419684 >>>> 10344621 ? ? ? ?4.55830890064195 >>>> 10344623 ? ? ? ?7.79420011157519 >>>> 10344625 ? ? ? ?8.93864799064523 >>>> 10344626 ? ? ? ?10.2404135279143 >>>> 10344627 ? ? ? ?8.36493644804453 >>>> 10344628 ? ? ? ?10.8239110733786 >>>> >>>> >>>> I am wondering if I processed the cel file correctly. What does the >>>> first column mean? >>> >>> The first column is the affy probeset ID. You can use the correct >>> annotation >>> package to map these IDs to more conventional IDs, such as Entrez Gene or >>> Ensembl using the correct .db package. Had you noted what chip this is, I >>> might have been able to point you to the correct chip. But you can peruse >>> this webpage to find it: >>> >>> http://www.bioconductor.org/packages/release/data/annotation/ >> >> Hi Jim, >> >> It's the Mouse Gene 1.0 ST Array. >> >> >> http://www.affymetrix.com/products_services/arrays/specific/mousege ne_1_st.affx >> >> The following is the output of my R script. Shall I use >> 'mogene10stprobeset.db' and 'mogene10sttranscriptcluster.db'? >> >> Regards, >> Peng >> >>> library(oligo) >> >> Loading required package: oligoClasses >> Loading required package: Biobase >> >> Welcome to Bioconductor >> >> ?Vignettes contain introductory material. To view, type >> ?'openVignette()'. To cite Bioconductor, see >> ?'citation("Biobase")' and for packages 'citation(pkgname)'. >> >> Loading required package: preprocessCore >> Welcome to oligo version 1.8.1 >>> >>> for (f in c("wt1-mth_HZ_5238_MST1_19385", >> >> + ?"wt2-mth_HZ_5238_MST1_19386", >> + ?"wt3-mth_HZ_5238_MST1_19387", >> + ?"wt4-mth_HZ_5238_MST1_19388", >> + ?"koA-mth_HZ_5238_MST1_19389", >> + ?"koB-mth_HZ_5238_MST1_19390", >> + ?"koC-mth_HZ_5238_MST1_19391", >> + ?"koD-mth_HZ_5238_MST1_19392" >> + )) { >> + data<-oligo::read.celfiles(paste(f, ".cel", sep='')) >> + eset<-rma(data) >> + write.exprs(eset, file=paste(f, ".txt", sep=''), sep="\t") >> + } > > OK. Seriously. Don't do this. If you got this idea somewhere, please let us > know where so we can correct that information. > > The rma method is designed to work with a set of chips, not one by one. You > want to do something like this: > > dat <- read.celfiles(list.celfiles()) > eset <- rma(dat) > > now use something like limma to find differentially expressed genes. Then if > you want to annotate them, you can use the mogene10stprobeset.db package. > > You might seriously consider purchasing this: > > http://www.bioconductor.org/pub/docs/mogr/ > > or finding a local statistician who is familiar with these tools to help > you. Hi, Thank you for your help. I paste the R code and the first 10 lines of the output at the end of the message. The results are correct, right? Would you please let me know what command I should use for annotation? I have the book on BioC, which has a lot of information. I need some time to absorb all the information. For now, would you please let me know what parts of the book I should focus on for my application? Regards, Peng library(oligo) data<-read.celfiles(list.celfiles()) eset<-rma(dat) eset<-rma(data) write.exprs(eset, file="output.txt", sep="\t") koA-mth_HZ_5238_MST1_19389.cel koB-mth_HZ_5238_MST1_19390.cel koC-mth_HZ_5238_MST1_19391.cel koD-mth_HZ_5238_MST1_19392.cel wt1-mth_HZ_5238_MST1_19385.cel wt2-mth_HZ_5238_MST1_19386.cel wt3-mth_HZ_5238_MST1_19387.cel wt4-mth_HZ_5238_MST1_19388.cel 10344615 7.07210987006919 7.01089258722033 7.26426270000726.92980486555595 7.72857978063884 6.91124431275741 7.457761829613277.21025349865986 10344617 3.02519545040591 3.08697023169755 3.032032340858283.09846420636071 3.12487891156704 3.10727683101607 3.0544609560487 3.03353963677405 10344619 3.20294677833793 3.20612630466463 3.176553031536723.13210443165341 3.1378507207366 3.21452663497659 3.313450502242243.09287042099817 10344621 4.70984671316916 4.68863215464979 4.437058573307564.59970839525133 4.66911715996711 4.80422412543456 4.57334787499862 4.60736276830484 10344623 7.79927399492793 7.78057650451938 7.727104168704187.68525205462879 7.66271776323834 7.65761154201622 7.67860029345257 7.80684426781102 10344625 8.43869623252839 9.23986002214653 9.014821817262028.8450593076064 8.59194370149885 9.08344656110017 9.074688130046138.92291936928794 10344626 10.0590964382247 9.75778614016683 9.668744583401899.91560261746937 9.97497585580347 9.90593250683953 9.72513220186519 10.0570156812405 10344627 7.45353674141328 7.85528510695415 7.12399388341447.48673272391552 8.2401362665769 7.24092300626232 7.4348487408975 7.8999935331867 10344628 10.1181530678991 10.2050144957479 10.082132643217510.2014962484731 10.3549307008668 9.97359523972773 9.82152593658235 10.0714458425003

ADD REPLY • link 14.8 years ago Peng Yu ▴ 940

Login before adding your answer.