Reading Affy CEL files
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.6 years ago
I am a newbie to Affy. Thanks for your help. I am processing CEL files through R (Affy package) and am having some basic issues that I am not finding satisfactory answers to (have googled). The chip used is hugene11stv1. I also am using the hugene11stprobeset.db to try to do probeset ???> Symbol translation. Essentially, I want to create a file with gene expression data, with genes * samples as my final matrix. Code: setwd(wDir); Data <- ReadAffy(); eset <- rma(Data); write.exprs(eset,file="geneExpData.txt", sep="\t", quote = F); When I analyze the file written, I see that the number of columns is as I expect(number samples) but there are 33,297 genes. Please help me understand a few fundamental aspects here: 1. I tried translating these Affy IDs to gene symbols to see if that would make my analysis easier. Here are some things I tried Try 1: symbols <- getSYMBOL(as.character(expr.matrix[,1]), "hugene11stprobeset"); ???> Not quite working. Only ~175 of the probeset IDs are getting translated. Try 2: symbs <- mget(featureNames(eset), hugene11stprobesetSYMBOL, ifnotfound =NA); symbs <- unlist(symbs) mat <- eset; # make a copy featureNames(mat) <- ifelse(!is.na(symbs), symbs, featureNames(mat)) Many NAs. Can you please help me understand what is happening here. -- output of sessionInfo(): R version 2.15.3 (2013-03-01) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] hugene11stv1cdf_2.3.0 affy_1.36.1 Biobase_2.18.0 [4] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] affyio_1.26.0 BiocInstaller_1.8.3 preprocessCore_1.20.0 [4] tools_2.15.3 zlibbioc_1.4.0 -- Sent via the guest posting facility at bioconductor.org.
affy affy • 1.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States
Hi Ranjani, On 5/31/2013 12:53 PM, Ranjani R [guest] wrote: > I am a newbie to Affy. Thanks for your help. > > I am processing CEL files through R (Affy package) and am having some basic issues that I am not finding satisfactory answers to (have googled). > The chip used is hugene11stv1. I also am using the hugene11stprobeset.db to try to do probeset ???> Symbol translation. > Essentially, I want to create a file with gene expression data, with genes * samples as my final matrix. > > Code: > setwd(wDir); > Data<- ReadAffy(); > eset<- rma(Data); > write.exprs(eset,file="geneExpData.txt", sep="\t", quote = F); > > When I analyze the file written, I see that the number of columns is as I expect(number samples) but there are 33,297 genes. > Please help me understand a few fundamental aspects here: > > 1. I tried translating these Affy IDs to gene symbols to see if that would make my analysis easier. > Here are some things I tried > > Try 1: > symbols<- getSYMBOL(as.character(expr.matrix[,1]), "hugene11stprobeset"); ???> Not quite working. Only ~175 of the probeset IDs are getting translated. There are two problems here. First, the affy package isn't designed for this array, and in fact won't let you proceed if you upgrade to the new version of Bioconductor. You should really be using either oligo or xps (both BioC packages) for the analysis of this array. Second, the affy package is only able to summarize these arrays at the transcript level, and you are trying to annotate using a package that assumes you have summarized at the probeset level (where each probeset is only interrogating a smaller portion of the transcript, often just a single exon). If you want to annotate your transcript level data, you need the hugene11sttranscriptcluster.db package. Best, Jim > Try 2: > symbs<- mget(featureNames(eset), hugene11stprobesetSYMBOL, ifnotfound =NA); > symbs<- unlist(symbs) > mat<- eset; # make a copy > featureNames(mat)<- ifelse(!is.na(symbs), symbs, featureNames(mat)) > > Many NAs. > > Can you please help me understand what is happening here. > > > -- output of sessionInfo(): > > R version 2.15.3 (2013-03-01) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] hugene11stv1cdf_2.3.0 affy_1.36.1 Biobase_2.18.0 > [4] BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] affyio_1.26.0 BiocInstaller_1.8.3 preprocessCore_1.20.0 > [4] tools_2.15.3 zlibbioc_1.4.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT

Login before adding your answer.

Traffic: 681 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6