Search
Question: Reading Affy CEL files
0
gravatar for Guest User
4.3 years ago by
Guest User12k
Guest User12k wrote:
I am a newbie to Affy. Thanks for your help. I am processing CEL files through R (Affy package) and am having some basic issues that I am not finding satisfactory answers to (have googled). The chip used is hugene11stv1. I also am using the hugene11stprobeset.db to try to do probeset ???> Symbol translation. Essentially, I want to create a file with gene expression data, with genes * samples as my final matrix. Code: setwd(wDir); Data <- ReadAffy(); eset <- rma(Data); write.exprs(eset,file="geneExpData.txt", sep="\t", quote = F); When I analyze the file written, I see that the number of columns is as I expect(number samples) but there are 33,297 genes. Please help me understand a few fundamental aspects here: 1. I tried translating these Affy IDs to gene symbols to see if that would make my analysis easier. Here are some things I tried Try 1: symbols <- getSYMBOL(as.character(expr.matrix[,1]), "hugene11stprobeset"); ???> Not quite working. Only ~175 of the probeset IDs are getting translated. Try 2: symbs <- mget(featureNames(eset), hugene11stprobesetSYMBOL, ifnotfound =NA); symbs <- unlist(symbs) mat <- eset; # make a copy featureNames(mat) <- ifelse(!is.na(symbs), symbs, featureNames(mat)) Many NAs. Can you please help me understand what is happening here. -- output of sessionInfo(): R version 2.15.3 (2013-03-01) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] hugene11stv1cdf_2.3.0 affy_1.36.1 Biobase_2.18.0 [4] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] affyio_1.26.0 BiocInstaller_1.8.3 preprocessCore_1.20.0 [4] tools_2.15.3 zlibbioc_1.4.0 -- Sent via the guest posting facility at bioconductor.org.
ADD COMMENTlink modified 4.3 years ago by James W. MacDonald44k • written 4.3 years ago by Guest User12k
0
gravatar for James W. MacDonald
4.3 years ago by
United States
James W. MacDonald44k wrote:
Hi Ranjani, On 5/31/2013 12:53 PM, Ranjani R [guest] wrote: > I am a newbie to Affy. Thanks for your help. > > I am processing CEL files through R (Affy package) and am having some basic issues that I am not finding satisfactory answers to (have googled). > The chip used is hugene11stv1. I also am using the hugene11stprobeset.db to try to do probeset ???> Symbol translation. > Essentially, I want to create a file with gene expression data, with genes * samples as my final matrix. > > Code: > setwd(wDir); > Data<- ReadAffy(); > eset<- rma(Data); > write.exprs(eset,file="geneExpData.txt", sep="\t", quote = F); > > When I analyze the file written, I see that the number of columns is as I expect(number samples) but there are 33,297 genes. > Please help me understand a few fundamental aspects here: > > 1. I tried translating these Affy IDs to gene symbols to see if that would make my analysis easier. > Here are some things I tried > > Try 1: > symbols<- getSYMBOL(as.character(expr.matrix[,1]), "hugene11stprobeset"); ???> Not quite working. Only ~175 of the probeset IDs are getting translated. There are two problems here. First, the affy package isn't designed for this array, and in fact won't let you proceed if you upgrade to the new version of Bioconductor. You should really be using either oligo or xps (both BioC packages) for the analysis of this array. Second, the affy package is only able to summarize these arrays at the transcript level, and you are trying to annotate using a package that assumes you have summarized at the probeset level (where each probeset is only interrogating a smaller portion of the transcript, often just a single exon). If you want to annotate your transcript level data, you need the hugene11sttranscriptcluster.db package. Best, Jim > Try 2: > symbs<- mget(featureNames(eset), hugene11stprobesetSYMBOL, ifnotfound =NA); > symbs<- unlist(symbs) > mat<- eset; # make a copy > featureNames(mat)<- ifelse(!is.na(symbs), symbs, featureNames(mat)) > > Many NAs. > > Can you please help me understand what is happening here. > > > -- output of sessionInfo(): > > R version 2.15.3 (2013-03-01) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] hugene11stv1cdf_2.3.0 affy_1.36.1 Biobase_2.18.0 > [4] BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] affyio_1.26.0 BiocInstaller_1.8.3 preprocessCore_1.20.0 > [4] tools_2.15.3 zlibbioc_1.4.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENTlink written 4.3 years ago by James W. MacDonald44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 282 users visited in the last hour