Question

Using function select

0

Entering edit mode

ekl • 0

@ekl-22229

Last seen 4.4 years ago

I am using Affy to annotate feature data in R after downloading Affy. When I type the following it says the select function cannot be found. Thank you.

anno<-select(tomatocdf.db, 
+ keys= (featureNames(eset)),
+ columns= c("SYMBOL", "GENENAME"),
+ keytype="PROBEID")
Error in select(tomatocdf.db, keys = (featureNames(eset)), columns = c("SYMBOL",  : 
  could not find function "select"

Affy • 1.4k views

ADD COMMENT • link updated 4.5 years ago by James W. MacDonald 65k • written 4.5 years ago by ekl • 0

score 2 · Answer 1 · 2019-10-28

Any time you get an error that says

could not find function "select"

That means the package that contains the function has not been loaded yet. In addition, you appear to be trying to use a non-existent package called 'tomatocdf.db', which is a concatenation of an actual package (tomatocdf) and .db, which isn't a thing. There isn't an annotation package for that array, so you will have to decide how interested you are in getting annotations.

The cheap and easy way to do it is to leverage somebody else's work:

> library(GEOquery)
> z <- getGEO("GSE125476")[[1]]
Found 1 file(s)
GSE125476_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE125nnn/GSE125476/matrix/GSE125476_series_matrix.txt.gz'
Content type 'application/x-gzip' length 1247686 bytes (1.2 MB)
downloaded 1.2 MB

> z
ExpressionSet (storageMode: lockedEnvironment)
assayData: 10209 features, 30 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM3574770 GSM3574771 ... GSM3574799 (30 total)
  varLabels: title geo_accession ... tissue:ch1 (35 total)
  varMetadata: labelDescription
featureData
  featureNames: AFFX-BioB-3_at AFFX-BioB-5_at ...
    RPTR-Les-XXU09476-1_at (10209 total)
  fvarLabels: ID GB_LIST ... SPOT_ID (17 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL4741 

> fData(z)[5000:5002,c(1,2,9,10,11,12)]
                               ID    GB_LIST Representative Public ID
Les.5430.1.S1_at Les.5430.1.S1_at BT013901.1               BT013901.1
Les.5431.1.S1_at Les.5431.1.S1_at BT013903.1               BT013903.1

                                                                                                                                                                      Gene Title
Les.5430.1.S1_at                                                                                                                                    Clone 132898F, mRNA sequence
Les.5431.1.S1_at                                                                                                                                    Clone 132900F, mRNA sequence

                 Gene Symbol ENTREZ_GENE_ID
Les.5430.1.S1_at                           
Les.5431.1.S1_at

So you could pop the fData slot out of that GEO dataset and put it in your ExpressionSet, making sure that you have the same row order (so your annotation lines up with the existing data).

But those annotations are from 2006, so there might be better data out there. You could hypothetically get the annotation CSV from Affy's website and use that as well:

> library(AffyCompatible)
## You need a username and pwd for Affy to download stuff
> rsrc <- NetAffxResource("jmacdon@med.umich.edu", password)
> affxDescription(rsrc[["Tomato"]])
[1] "Annotations, CSV format"         "CDF Library File"               
[3] "CIF Library File"                "PSI Library File"               
[5] "Probe Sequences, FASTA format"   "Probe Sequences, tabular format"
[7] "TAC 4.x Configuration file"      "tac_qcc file" 

> df <- readAnnotation(rsrc, annotation = rsrc[["Tomato", "Annotations, CSV format"]], comment.char = "#")
> names(df)
 [1] "Probe.Set.ID"                     "GeneChip.Array"                  
 [3] "Species.Scientific.Name"          "Annotation.Date"                 
 [5] "Sequence.Type"                    "Sequence.Source"                 
 [7] "Transcript.ID.Array.Design."      "Target.Description"              
 [9] "Representative.Public.ID"         "Archival.UniGene.Cluster"        
[11] "UniGene.ID"                       "Genome.Version"                  
[13] "Alignments"                       "Gene.Title"                      
[15] "Gene.Symbol"                      "Chromosomal.Location"            
[17] "Unigene.Cluster.Type"             "Ensembl"                         
[19] "Entrez.Gene"                      "SwissProt"                       
[21] "EC"                               "OMIM"                            
[23] "RefSeq.Protein.ID"                "RefSeq.Transcript.ID"            
[25] "FlyBase"                          "AGI"                             
[27] "WormBase"                         "MGI.Name"                        
[29] "RGD.Name"                         "SGD.accession.number"            
<snip>
> df[5010:5012,c(1,9,19)]
         Probe.Set.ID Representative.Public.ID Entrez.Gene
5010  Les.544.1.A1_at                 BG630730   101249417
5011 Les.5440.1.S1_at               BT013926.1   101267119