How to annotate a limma Elist raw class object with an annotation file
2
0
Entering edit mode
@0099a3a4
Last seen 25 days ago
Italy

Dear all,

I'm following the Limma vignette to analyze a single-channel Agilent microarray experiment, but I haven't found the corresponding annotation package for RnAgilentDesign028279, so I downloaded this annotation file from Agilent eArray. So, I have an annotation file .txt, but I cannot use it as a data frame or a .db using the mapIds function. What function could I use in this step to get gene symbols and Entrez Gene Ids from the annotation file and match them to the probe IDs of the limma Elist raw class object? Thank you in advance for your suggestions.

Giulia

> library(limma)
> x <-read.maimages(SDRF[,"Array Data File"], source ="agilent", green.only=TRUE, other.columns="gIsWellAboveBG")
> dim(x)
[1] 62976     8
> head(x$genes)
  Row Col ControlType       ProbeName  SystematicName
1   1   1           1 GE_BrightCorner GE_BrightCorner
2   1   2           1      DarkCorner      DarkCorner
3   1   3           1      DarkCorner      DarkCorner
4   1   4           0    A_44_P133119    NM_001106008
5   1   5           0   A_44_P1028743    NM_001109304
6   1   6           0    A_64_P085072    NM_001106395
> library(AnnotationDbi)
> Sys.setenv(LANG = "en")

> RnAgilentDesign028279 <-read.delim2("updated_annotation_RnAgilentDesign028279.txt")
> colnames(RnAgilentDesign028279)
[1] "ProbeName"  "GeneSymbol" "EntrezID"  
> x$genes$EntrezID <- mapIds(RnAgilentDesign028279, x$genes$ProbeName, keytype = "PROBEID", column = "ENTREZID")
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function mapIds for signature "data.frame"

> library(RSQLite) 
> RnAgilentDesign028279_db <- dbConnect(RSQLite::SQLite(), "Updated_Annotation_RnAgilentDesign028279.db")
> x$genes$EntrezID <- mapIds(RnAgilentDesign028279_db, x$genes$ProbeName, keytype = "PROBEID", column = "ENTREZID")
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function mapIds for signature "SQLiteConnection"


sessionInfo( )
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=Italian_Italy.utf8  LC_CTYPE=Italian_Italy.utf8   
[3] LC_MONETARY=Italian_Italy.utf8 LC_NUMERIC=C                  
[5] LC_TIME=Italian_Italy.utf8    

time zone: Europe/Rome
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
[1] annotate_1.80.0      AnnotationDbi_1.64.1 IRanges_2.36.0      
[4] S4Vectors_0.40.2     Biobase_2.62.0       BiocGenerics_0.48.1 
[7] XML_3.99-0.17        limma_3.58.1        

loaded via a namespace (and not attached):
 [1] crayon_1.5.3            vctrs_0.6.5            
 [3] httr_1.4.7              cli_3.6.3              
 [5] rlang_1.1.4             DBI_1.2.3              
 [7] png_0.1-8               xtable_1.8-4           
 [9] bit_4.5.0.1             statmod_1.5.0          
[11] RCurl_1.98-1.16         Biostrings_2.70.3      
[13] KEGGREST_1.42.0         bitops_1.0-9           
[15] fastmap_1.2.0           GenomeInfoDb_1.38.8    
[17] memoise_2.0.1           compiler_4.3.3         
[19] RSQLite_2.3.9           blob_1.2.4             
[21] XVector_0.42.0          rstudioapi_0.17.1      
[23] R6_2.5.1                GenomeInfoDbData_1.2.11
[25] tools_4.3.3             bit64_4.5.2            
[27] zlibbioc_1.48.2         cachem_1.1.0           
>
limma AnnotationDbi • 1.2k views
ADD COMMENT
1
Entering edit mode
@0099a3a4
Last seen 25 days ago
Italy

Good afternoon,

Thank you for your explanation and suggestion to overcome the issue by annotating directly from the genome-wide annotation for the Rat package using the select function, but I still have some issues with the code.

Any help to continue on this path will be appreciated.

Giulia

 > moreannot <- select(org.Rn.eg.db, x$SystematicName, c("SYMBOL","ENTREZGENE"), "ACCNUM")
Errore in .testForValidCols(x, cols) : 
  Invalid columns: ENTREZGENE. Please use the columns method to see a listing of valid arguments.

> columns(org.Rn.eg.db)
 [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"
 [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"    
[11] "GENETYPE"     "GO"           "GOALL"        "IPI"          "ONTOLOGY"    
[16] "ONTOLOGYALL"  "PATH"         "PFAM"         "PMID"         "PROSITE"     
[21] "REFSEQ"       "SYMBOL"       "UNIPROT"   

> moreannot <- select(org.Rn.eg.db, x$SystematicName, c("SYMBOL","ENTREZID"))
Errore in .testForValidKeys(x, keys, keytype, fks) : 
  'keys' must be a character vector
ADD COMMENT
1
Entering edit mode

Oh right. My bad. Good for you though, figuring out that it's ENTREZID, not ENTREZGENE. You did a good job of reading the error message in the first place, and diagnosing. Not so much with the second one though, which seems to be as clear. Let's unpack it. You got

Errore in .testForValidKeys(x, keys, keytype, fks) : 
  'keys' must be a character vector

Which clearly states that the keys argument has to be a character vector. Ideally your next step is to do

class(x$SystematicName)

Which will return factor. The next step might take some Google work, but a simple query of 'convert factor to character R' (which auto-completes for me, so obviously a common enough query) will bring up as the first gazillion replies that you use as.character. Knowing how to get answers yourself is an invaluable skill, and a simple Google query is often all you need.

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen just now
United States

Your first step when trying to use a function is to read the help page (?mapIds), which will bring this up:

Usage:

       columns(x)
       keytypes(x)
       keys(x, keytype, ...)
       select(x, keys, columns, keytype, ...)
       mapIds(x, keys, column, keytype, ..., multiVals)
       saveDb(x, file)
       loadDb(file, packageName=NA)

Arguments:

       x: the 'AnnotationDb' object. But in practice this will mean an
          object derived from an 'AnnotationDb' object such as a
          'OrgDb' or 'ChipDb' object.

Which should tell you that a data.frame won't work, because it's not an AnnotationDb object. You should also read the vignette(s) for AnnotationDbi, which are meant to help get people started. It appears that you are just trying things and hoping something will work, which is probably not the way to proceed. Reading and understanding the available help is the way to go.

But anyway. What you have are two possibly useful columns (ProbeName and SystematicName). The ProbeName is just an internal Agilent ID that doesn't really mean anything external to Agilent arrays, but it does map the SystematicName (which if you look closely has things like NM_001106008, which is an NCBI GenBank ID) to the ProbeName. That mapping isn't super critical in this context, as read.maimages will guarantee that the order of your x$genes is identical to the order of x$E. So all you need to do is add the Gene symbols or whatever you want.

Let's say you want to include the gene symbol and NCBI Gene ID (what used to be called 'EntrezGene'). You can do that by

library(org.Rn.eg.db)
moreannot <- select(org.Rn.eg.db, x$SystematicName, c("SYMBOL","ENTREZGENE"), "ACCNUM")
x$gene <- data.frame(x$gene[,3:5], moreannot)

The only tricky part to what I show here is knowing that 'ACCNUM' means 'GenBank or RefSeq' IDs. There is no 'GENEBANK' or 'REFSEQ', which could be confusing.

The last step adds the annotation data, and removes the row/column information which is not useful for what the 'genes' list item is used for. You can use the ControlType column for various QC steps (see the limma User's Guide), and when you use topTable, it will automatically include data from the 'genes' list item, so you get an annotated output, which is nice.

Login before adding your answer.

Traffic: 888 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6