Annotating limma results: Affymetrix probe IDs not mapping to hugene10stprobeset.db
1
0
Entering edit mode
@stephen-turner-4916
Last seen 5.8 years ago
United States
I asked a similar question yesterday - wanted to clarify and give more information. I am using limma to analyze microarray data from Affymetrix HuGene 1.0 ST arrays. I'm reading in the CEL files using ReadAffy. Both sources of annotation confirm that I'm using the hugene1.0st array: > affybatch@cdfName [1] "HuGene-1_0-st-v1" > eset@annotation [1] "hugene10stv1" I fit a model, and now I want to annotate the results with gene symbols rather than the probeset IDs: > fit <- lmFit(eset, design) > head(fit$genes) ID 1 7892501 2 7892502 3 7892503 4 7892504 5 7892505 6 7892506 When I try to use getSYMBOL (as per Gordon's suggestion from a previous post:https://stat.ethz.ch/pipermail/bioconductor/2011-February/037866. html), none of these symbols map: > getSYMBOL(head(fit$genes$ID), "hugene10stprobeset.db") 7892501 7892502 7892503 7892504 7892505 7892506 NA NA NA NA NA NA In fact, of my 32,321 probeset IDs, only 150 match up with the IDs in the hugene10stprobeset.db package: > mapped_probes <- mappedkeys(hugene10stprobesetSYMBOL) > head(mapped_probes) [1] "7896741" "7896743" "7896745" "7896755" "7896757" "7896758" > length(fit$genes$ID) [1] 32321 > length(mapped_probes) [1] 238111 > sum(fit$genes$ID %in% mapped_probes) [1] 150 Thanks in advance for any help! Stephen > sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C/en_US.UTF-8/C/C/C/C attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] hugene10stv1probe_2.9.0 BiocInstaller_1.2.1 hugene10stv1cdf_2.9.1 hugene10stprobeset.db_8.0.1 [5] org.Hs.eg.db_2.6.4 RSQLite_0.11.1 DBI_0.2-5 annotate_1.32.1 [9] AnnotationDbi_1.16.10 pvclust_1.2-2 calibrate_1.7 gplots_2.10.1 [13] KernSmooth_2.23-7 caTools_1.12 bitops_1.0-4.1 gdata_2.8.2 [17] gtools_2.6.2 limma_3.10.1 arrayQualityMetrics_3.10.0 affy_1.32.0 [21] Biobase_2.14.0 [[alternative HTML version deleted]]
Microarray Annotation annotate limma Microarray Annotation annotate limma • 1.7k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States
Hi Stephen On 1/17/2012 9:58 AM, Stephen Turner wrote: > I asked a similar question yesterday - wanted to clarify and give more > information. I am using limma to analyze microarray data from Affymetrix > HuGene 1.0 ST arrays. I'm reading in the CEL files using ReadAffy. Both > sources of annotation confirm that I'm using the hugene1.0st array: > >> affybatch at cdfName > [1] "HuGene-1_0-st-v1" >> eset at annotation > [1] "hugene10stv1" > > I fit a model, and now I want to annotate the results with gene symbols > rather than the probeset IDs: > >> fit<- lmFit(eset, design) >> head(fit$genes) > ID > 1 7892501 > 2 7892502 > 3 7892503 > 4 7892504 > 5 7892505 > 6 7892506 > > When I try to use getSYMBOL (as per Gordon's suggestion from a previous > post:https://stat.ethz.ch/pipermail/bioconductor/2011-February/03786 6.html), > none of these symbols map: > >> getSYMBOL(head(fit$genes$ID), "hugene10stprobeset.db") You want the hugene10sttranscriptcluster.db package. By default oligo summarizes at the transcript level. Best, Jim > 7892501 7892502 7892503 7892504 7892505 7892506 > NA NA NA NA NA NA > > In fact, of my 32,321 probeset IDs, only 150 match up with the IDs in the > hugene10stprobeset.db package: >> mapped_probes<- mappedkeys(hugene10stprobesetSYMBOL) >> head(mapped_probes) > [1] "7896741" "7896743" "7896745" "7896755" "7896757" "7896758" >> length(fit$genes$ID) > [1] 32321 >> length(mapped_probes) > [1] 238111 >> sum(fit$genes$ID %in% mapped_probes) > [1] 150 > > Thanks in advance for any help! > > Stephen > >> sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C/en_US.UTF-8/C/C/C/C > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] hugene10stv1probe_2.9.0 BiocInstaller_1.2.1 > hugene10stv1cdf_2.9.1 hugene10stprobeset.db_8.0.1 > [5] org.Hs.eg.db_2.6.4 RSQLite_0.11.1 DBI_0.2-5 > annotate_1.32.1 > [9] AnnotationDbi_1.16.10 pvclust_1.2-2 calibrate_1.7 > gplots_2.10.1 > [13] KernSmooth_2.23-7 caTools_1.12 bitops_1.0-4.1 > gdata_2.8.2 > [17] gtools_2.6.2 limma_3.10.1 > arrayQualityMetrics_3.10.0 affy_1.32.0 > [21] Biobase_2.14.0 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT

Login before adding your answer.

Traffic: 518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6