Reading Affy CEL files

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

I am a newbie to Affy. Thanks for your help. I am processing CEL files through R (Affy package) and am having some basic issues that I am not finding satisfactory answers to (have googled). The chip used is hugene11stv1. I also am using the hugene11stprobeset.db to try to do probeset ???> Symbol translation. Essentially, I want to create a file with gene expression data, with genes * samples as my final matrix. Code: setwd(wDir); Data <- ReadAffy(); eset <- rma(Data); write.exprs(eset,file="geneExpData.txt", sep="\t", quote = F); When I analyze the file written, I see that the number of columns is as I expect(number samples) but there are 33,297 genes. Please help me understand a few fundamental aspects here: 1. I tried translating these Affy IDs to gene symbols to see if that would make my analysis easier. Here are some things I tried Try 1: symbols <- getSYMBOL(as.character(expr.matrix[,1]), "hugene11stprobeset"); ???> Not quite working. Only ~175 of the probeset IDs are getting translated. Try 2: symbs <- mget(featureNames(eset), hugene11stprobesetSYMBOL, ifnotfound =NA); symbs <- unlist(symbs) mat <- eset; # make a copy featureNames(mat) <- ifelse(!is.na(symbs), symbs, featureNames(mat)) Many NAs. Can you please help me understand what is happening here. -- output of sessionInfo(): R version 2.15.3 (2013-03-01) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] hugene11stv1cdf_2.3.0 affy_1.36.1 Biobase_2.18.0 [4] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] affyio_1.26.0 BiocInstaller_1.8.3 preprocessCore_1.20.0 [4] tools_2.15.3 zlibbioc_1.4.0 -- Sent via the guest posting facility at bioconductor.org.

affy affy • 1.2k views

ADD COMMENT • link updated 10.9 years ago by James W. MacDonald 65k • written 10.9 years ago by Guest User ★ 13k

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 day ago

United States

Hi Ranjani, On 5/31/2013 12:53 PM, Ranjani R [guest] wrote: > I am a newbie to Affy. Thanks for your help. > > I am processing CEL files through R (Affy package) and am having some basic issues that I am not finding satisfactory answers to (have googled). > The chip used is hugene11stv1. I also am using the hugene11stprobeset.db to try to do probeset ???> Symbol translation. > Essentially, I want to create a file with gene expression data, with genes * samples as my final matrix. > > Code: > setwd(wDir); > Data<- ReadAffy(); > eset<- rma(Data); > write.exprs(eset,file="geneExpData.txt", sep="\t", quote = F); > > When I analyze the file written, I see that the number of columns is as I expect(number samples) but there are 33,297 genes. > Please help me understand a few fundamental aspects here: > > 1. I tried translating these Affy IDs to gene symbols to see if that would make my analysis easier. > Here are some things I tried > > Try 1: > symbols<- getSYMBOL(as.character(expr.matrix[,1]), "hugene11stprobeset"); ???> Not quite working. Only ~175 of the probeset IDs are getting translated. There are two problems here. First, the affy package isn't designed for this array, and in fact won't let you proceed if you upgrade to the new version of Bioconductor. You should really be using either oligo or xps (both BioC packages) for the analysis of this array. Second, the affy package is only able to summarize these arrays at the transcript level, and you are trying to annotate using a package that assumes you have summarized at the probeset level (where each probeset is only interrogating a smaller portion of the transcript, often just a single exon). If you want to annotate your transcript level data, you need the hugene11sttranscriptcluster.db package. Best, Jim > Try 2: > symbs<- mget(featureNames(eset), hugene11stprobesetSYMBOL, ifnotfound =NA); > symbs<- unlist(symbs) > mat<- eset; # make a copy > featureNames(mat)<- ifelse(!is.na(symbs), symbs, featureNames(mat)) > > Many NAs. > > Can you please help me understand what is happening here. > > > -- output of sessionInfo(): > > R version 2.15.3 (2013-03-01) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] hugene11stv1cdf_2.3.0 affy_1.36.1 Biobase_2.18.0 > [4] BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] affyio_1.26.0 BiocInstaller_1.8.3 preprocessCore_1.20.0 > [4] tools_2.15.3 zlibbioc_1.4.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 10.9 years ago James W. MacDonald 65k

Login before adding your answer.