Annotating limma results: Affymetrix probe IDs not mapping to hugene10stprobeset.db

0

Entering edit mode

Stephen Turner ▴ 290

@stephen-turner-4916

Last seen 5.8 years ago

United States

I asked a similar question yesterday - wanted to clarify and give more information. I am using limma to analyze microarray data from Affymetrix HuGene 1.0 ST arrays. I'm reading in the CEL files using ReadAffy. Both sources of annotation confirm that I'm using the hugene1.0st array: > affybatch@cdfName [1] "HuGene-1_0-st-v1" > eset@annotation [1] "hugene10stv1" I fit a model, and now I want to annotate the results with gene symbols rather than the probeset IDs: > fit <- lmFit(eset, design) > head(fit$genes) ID 1 7892501 2 7892502 3 7892503 4 7892504 5 7892505 6 7892506 When I try to use getSYMBOL (as per Gordon's suggestion from a previous post:https://stat.ethz.ch/pipermail/bioconductor/2011-February/037866. html), none of these symbols map: > getSYMBOL(head(fit$genes$ID), "hugene10stprobeset.db") 7892501 7892502 7892503 7892504 7892505 7892506 NA NA NA NA NA NA In fact, of my 32,321 probeset IDs, only 150 match up with the IDs in the hugene10stprobeset.db package: > mapped_probes <- mappedkeys(hugene10stprobesetSYMBOL) > head(mapped_probes) [1] "7896741" "7896743" "7896745" "7896755" "7896757" "7896758" > length(fit$genes$ID) [1] 32321 > length(mapped_probes) [1] 238111 > sum(fit$genes$ID %in% mapped_probes) [1] 150 Thanks in advance for any help! Stephen > sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C/en_US.UTF-8/C/C/C/C attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] hugene10stv1probe_2.9.0 BiocInstaller_1.2.1 hugene10stv1cdf_2.9.1 hugene10stprobeset.db_8.0.1 [5] org.Hs.eg.db_2.6.4 RSQLite_0.11.1 DBI_0.2-5 annotate_1.32.1 [9] AnnotationDbi_1.16.10 pvclust_1.2-2 calibrate_1.7 gplots_2.10.1 [13] KernSmooth_2.23-7 caTools_1.12 bitops_1.0-4.1 gdata_2.8.2 [17] gtools_2.6.2 limma_3.10.1 arrayQualityMetrics_3.10.0 affy_1.32.0 [21] Biobase_2.14.0 [[alternative HTML version deleted]]

Microarray Annotation annotate limma Microarray Annotation annotate limma • 1.7k views

ADD COMMENT • link updated 12.3 years ago by James W. MacDonald 65k • written 12.3 years ago by Stephen Turner ▴ 290

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 8 hours ago

United States

Hi Stephen On 1/17/2012 9:58 AM, Stephen Turner wrote: > I asked a similar question yesterday - wanted to clarify and give more > information. I am using limma to analyze microarray data from Affymetrix > HuGene 1.0 ST arrays. I'm reading in the CEL files using ReadAffy. Both > sources of annotation confirm that I'm using the hugene1.0st array: > >> affybatch at cdfName > [1] "HuGene-1_0-st-v1" >> eset at annotation > [1] "hugene10stv1" > > I fit a model, and now I want to annotate the results with gene symbols > rather than the probeset IDs: > >> fit<- lmFit(eset, design) >> head(fit$genes) > ID > 1 7892501 > 2 7892502 > 3 7892503 > 4 7892504 > 5 7892505 > 6 7892506 > > When I try to use getSYMBOL (as per Gordon's suggestion from a previous > post:https://stat.ethz.ch/pipermail/bioconductor/2011-February/03786 6.html), > none of these symbols map: > >> getSYMBOL(head(fit$genes$ID), "hugene10stprobeset.db") You want the hugene10sttranscriptcluster.db package. By default oligo summarizes at the transcript level. Best, Jim > 7892501 7892502 7892503 7892504 7892505 7892506 > NA NA NA NA NA NA > > In fact, of my 32,321 probeset IDs, only 150 match up with the IDs in the > hugene10stprobeset.db package: >> mapped_probes<- mappedkeys(hugene10stprobesetSYMBOL) >> head(mapped_probes) > [1] "7896741" "7896743" "7896745" "7896755" "7896757" "7896758" >> length(fit$genes$ID) > [1] 32321 >> length(mapped_probes) > [1] 238111 >> sum(fit$genes$ID %in% mapped_probes) > [1] 150 > > Thanks in advance for any help! > > Stephen > >> sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C/en_US.UTF-8/C/C/C/C > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] hugene10stv1probe_2.9.0 BiocInstaller_1.2.1 > hugene10stv1cdf_2.9.1 hugene10stprobeset.db_8.0.1 > [5] org.Hs.eg.db_2.6.4 RSQLite_0.11.1 DBI_0.2-5 > annotate_1.32.1 > [9] AnnotationDbi_1.16.10 pvclust_1.2-2 calibrate_1.7 > gplots_2.10.1 > [13] KernSmooth_2.23-7 caTools_1.12 bitops_1.0-4.1 > gdata_2.8.2 > [17] gtools_2.6.2 limma_3.10.1 > arrayQualityMetrics_3.10.0 affy_1.32.0 > [21] Biobase_2.14.0 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 12.3 years ago James W. MacDonald 65k

Login before adding your answer.