Question

How to annotate a limma Elist raw class object with an annotation file

0

Entering edit mode

giulia.gentile ▴ 10

@0099a3a4

Last seen 3 months ago

Italy

Dear all,

I'm following the Limma vignette to analyze a single-channel Agilent microarray experiment, but I haven't found the corresponding annotation package for RnAgilentDesign028279, so I downloaded this annotation file from Agilent eArray. So, I have an annotation file .txt, but I cannot use it as a data frame or a .db using the mapIds function. What function could I use in this step to get gene symbols and Entrez Gene Ids from the annotation file and match them to the probe IDs of the limma Elist raw class object? Thank you in advance for your suggestions.

Giulia

> library(limma)
> x <-read.maimages(SDRF[,"Array Data File"], source ="agilent", green.only=TRUE, other.columns="gIsWellAboveBG")
> dim(x)
[1] 62976     8
> head(x$genes)
  Row Col ControlType       ProbeName  SystematicName
1   1   1           1 GE_BrightCorner GE_BrightCorner
2   1   2           1      DarkCorner      DarkCorner
3   1   3           1      DarkCorner      DarkCorner
4   1   4           0    A_44_P133119    NM_001106008
5   1   5           0   A_44_P1028743    NM_001109304
6   1   6           0    A_64_P085072    NM_001106395
> library(AnnotationDbi)
> Sys.setenv(LANG = "en")

> RnAgilentDesign028279 <-read.delim2("updated_annotation_RnAgilentDesign028279.txt")
> colnames(RnAgilentDesign028279)
[1] "ProbeName"  "GeneSymbol" "EntrezID"  
> x$genes$EntrezID <- mapIds(RnAgilentDesign028279, x$genes$ProbeName, keytype = "PROBEID", column = "ENTREZID")
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function mapIds for signature "data.frame"

> library(RSQLite) 
> RnAgilentDesign028279_db <- dbConnect(RSQLite::SQLite(), "Updated_Annotation_RnAgilentDesign028279.db")
> x$genes$EntrezID <- mapIds(RnAgilentDesign028279_db, x$genes$ProbeName, keytype = "PROBEID", column = "ENTREZID")
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function mapIds for signature "SQLiteConnection"


sessionInfo( )
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=Italian_Italy.utf8  LC_CTYPE=Italian_Italy.utf8   
[3] LC_MONETARY=Italian_Italy.utf8 LC_NUMERIC=C                  
[5] LC_TIME=Italian_Italy.utf8    

time zone: Europe/Rome
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
[1] annotate_1.80.0      AnnotationDbi_1.64.1 IRanges_2.36.0      
[4] S4Vectors_0.40.2     Biobase_2.62.0       BiocGenerics_0.48.1 
[7] XML_3.99-0.17        limma_3.58.1        

loaded via a namespace (and not attached):
 [1] crayon_1.5.3            vctrs_0.6.5            
 [3] httr_1.4.7              cli_3.6.3              
 [5] rlang_1.1.4             DBI_1.2.3              
 [7] png_0.1-8               xtable_1.8-4           
 [9] bit_4.5.0.1             statmod_1.5.0          
[11] RCurl_1.98-1.16         Biostrings_2.70.3      
[13] KEGGREST_1.42.0         bitops_1.0-9           
[15] fastmap_1.2.0           GenomeInfoDb_1.38.8    
[17] memoise_2.0.1           compiler_4.3.3         
[19] RSQLite_2.3.9           blob_1.2.4             
[21] XVector_0.42.0          rstudioapi_0.17.1      
[23] R6_2.5.1                GenomeInfoDbData_1.2.11
[25] tools_4.3.3             bit64_4.5.2            
[27] zlibbioc_1.48.2         cachem_1.1.0           
>

limma AnnotationDbi • 1.4k views

ADD COMMENT • link updated 3 months ago by James W. MacDonald 68k • written 3 months ago by giulia.gentile ▴ 10

score 1 · Answer 1 · 2024-12-05

Good afternoon,

Thank you for your explanation and suggestion to overcome the issue by annotating directly from the genome-wide annotation for the Rat package using the select function, but I still have some issues with the code.

Any help to continue on this path will be appreciated.

Giulia

 > moreannot <- select(org.Rn.eg.db, x$SystematicName, c("SYMBOL","ENTREZGENE"), "ACCNUM")
Errore in .testForValidCols(x, cols) : 
  Invalid columns: ENTREZGENE. Please use the columns method to see a listing of valid arguments.

> columns(org.Rn.eg.db)
 [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"
 [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"    
[11] "GENETYPE"     "GO"           "GOALL"        "IPI"          "ONTOLOGY"    
[16] "ONTOLOGYALL"  "PATH"         "PFAM"         "PMID"         "PROSITE"     
[21] "REFSEQ"       "SYMBOL"       "UNIPROT"   

> moreannot <- select(org.Rn.eg.db, x$SystematicName, c("SYMBOL","ENTREZID"))
Errore in .testForValidKeys(x, keys, keytype, fks) : 
  'keys' must be a character vector

score 0 · Answer 2 · 2024-12-04

Your first step when trying to use a function is to read the help page (?mapIds), which will bring this up:

Usage:

       columns(x)
       keytypes(x)
       keys(x, keytype, ...)
       select(x, keys, columns, keytype, ...)
       mapIds(x, keys, column, keytype, ..., multiVals)
       saveDb(x, file)
       loadDb(file, packageName=NA)

Arguments:

       x: the 'AnnotationDb' object. But in practice this will mean an
          object derived from an 'AnnotationDb' object such as a
          'OrgDb' or 'ChipDb' object.

Which should tell you that a data.frame won't work, because it's not an AnnotationDb object. You should also read the vignette(s) for AnnotationDbi, which are meant to help get people started. It appears that you are just trying things and hoping something will work, which is probably not the way to proceed. Reading and understanding the available help is the way to go.

But anyway. What you have are two possibly useful columns (ProbeName and SystematicName). The ProbeName is just an internal Agilent ID that doesn't really mean anything external to Agilent arrays, but it does map the SystematicName (which if you look closely has things like NM_001106008, which is an NCBI GenBank ID) to the ProbeName. That mapping isn't super critical in this context, as read.maimages will guarantee that the order of your x$genes is identical to the order of x$E. So all you need to do is add the Gene symbols or whatever you want.

Let's say you want to include the gene symbol and NCBI Gene ID (what used to be called 'EntrezGene'). You can do that by

library(org.Rn.eg.db)
moreannot <- select(org.Rn.eg.db, x$SystematicName, c("SYMBOL","ENTREZGENE"), "ACCNUM")
x$gene <- data.frame(x$gene[,3:5], moreannot)

The only tricky part to what I show here is knowing that 'ACCNUM' means 'GenBank or RefSeq' IDs. There is no 'GENEBANK' or 'REFSEQ', which could be confusing.

The last step adds the annotation data, and removes the row/column information which is not useful for what the 'genes' list item is used for. You can use the ControlType column for various QC steps (see the limma User's Guide), and when you use topTable, it will automatically include data from the 'genes' list item, so you get an annotated output, which is nice.