Dear all,
I'm following the Limma vignette to analyze a single-channel Agilent microarray experiment, but I haven't found the corresponding annotation package for RnAgilentDesign028279, so I downloaded this annotation file from Agilent eArray. So, I have an annotation file .txt, but I cannot use it as a data frame or a .db using the mapIds function. What function could I use in this step to get gene symbols and Entrez Gene Ids from the annotation file and match them to the probe IDs of the limma Elist raw class object? Thank you in advance for your suggestions.
Giulia
> library(limma)
> x <-read.maimages(SDRF[,"Array Data File"], source ="agilent", green.only=TRUE, other.columns="gIsWellAboveBG")
> dim(x)
[1] 62976 8
> head(x$genes)
Row Col ControlType ProbeName SystematicName
1 1 1 1 GE_BrightCorner GE_BrightCorner
2 1 2 1 DarkCorner DarkCorner
3 1 3 1 DarkCorner DarkCorner
4 1 4 0 A_44_P133119 NM_001106008
5 1 5 0 A_44_P1028743 NM_001109304
6 1 6 0 A_64_P085072 NM_001106395
> library(AnnotationDbi)
> Sys.setenv(LANG = "en")
> RnAgilentDesign028279 <-read.delim2("updated_annotation_RnAgilentDesign028279.txt")
> colnames(RnAgilentDesign028279)
[1] "ProbeName" "GeneSymbol" "EntrezID"
> x$genes$EntrezID <- mapIds(RnAgilentDesign028279, x$genes$ProbeName, keytype = "PROBEID", column = "ENTREZID")
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function mapIds for signature "data.frame"
> library(RSQLite)
> RnAgilentDesign028279_db <- dbConnect(RSQLite::SQLite(), "Updated_Annotation_RnAgilentDesign028279.db")
> x$genes$EntrezID <- mapIds(RnAgilentDesign028279_db, x$genes$ProbeName, keytype = "PROBEID", column = "ENTREZID")
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function mapIds for signature "SQLiteConnection"
sessionInfo( )
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=Italian_Italy.utf8 LC_CTYPE=Italian_Italy.utf8
[3] LC_MONETARY=Italian_Italy.utf8 LC_NUMERIC=C
[5] LC_TIME=Italian_Italy.utf8
time zone: Europe/Rome
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets
[7] methods base
other attached packages:
[1] annotate_1.80.0 AnnotationDbi_1.64.1 IRanges_2.36.0
[4] S4Vectors_0.40.2 Biobase_2.62.0 BiocGenerics_0.48.1
[7] XML_3.99-0.17 limma_3.58.1
loaded via a namespace (and not attached):
[1] crayon_1.5.3 vctrs_0.6.5
[3] httr_1.4.7 cli_3.6.3
[5] rlang_1.1.4 DBI_1.2.3
[7] png_0.1-8 xtable_1.8-4
[9] bit_4.5.0.1 statmod_1.5.0
[11] RCurl_1.98-1.16 Biostrings_2.70.3
[13] KEGGREST_1.42.0 bitops_1.0-9
[15] fastmap_1.2.0 GenomeInfoDb_1.38.8
[17] memoise_2.0.1 compiler_4.3.3
[19] RSQLite_2.3.9 blob_1.2.4
[21] XVector_0.42.0 rstudioapi_0.17.1
[23] R6_2.5.1 GenomeInfoDbData_1.2.11
[25] tools_4.3.3 bit64_4.5.2
[27] zlibbioc_1.48.2 cachem_1.1.0
>
Oh right. My bad. Good for you though, figuring out that it's ENTREZID, not ENTREZGENE. You did a good job of reading the error message in the first place, and diagnosing. Not so much with the second one though, which seems to be as clear. Let's unpack it. You got
Which clearly states that the keys argument has to be a character vector. Ideally your next step is to do
Which will return
factor
. The next step might take some Google work, but a simple query of 'convert factor to character R' (which auto-completes for me, so obviously a common enough query) will bring up as the first gazillion replies that you useas.character
. Knowing how to get answers yourself is an invaluable skill, and a simple Google query is often all you need.