Question

Getting error in collapsing the affy probs

0

Entering edit mode

AZ ▴ 30

@fereshteh-15803

Last seen 19 months ago

United Kingdom

Hello

I have a script and and R object for GDS1393[ACCN] with Reference Series: GSE2457

I want to Collapse individual probe intensities to gene-level (EntrezIDs) measurements but I get this error

>   > filt_eset <- featureFilter(eset, require.entrez = TRUE, remove.dupEntrez = TRUE)
>     Error in xj[i] : invalid subscript type 'list'
>     >

I also tried this which gives error

> anno_filt_eset2 <- AnnotationDbi::select(hgu133plus2.db, keys = (featureNames(filt_eset2)), columns = c("SYMBOL", "GENENAME", "ENTREZID"), keytype = "PROBEID")
Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'PROBEID'. Please use the keys method to see a listing of valid arguments.

This is my R object

https://www.dropbox.com/s/dpydi4yjubrmwmt/affy_data.RData?dl=0

This is my R script

https://www.dropbox.com/s/k338ooac2dpifg6/GSEA_Broad.R?dl=0

Please help me to fix this error

Thank you

affy eset • 1.3k views

ADD COMMENT • link updated 3.9 years ago by James W. MacDonald 67k • written 3.9 years ago by AZ ▴ 30

0

Entering edit mode

seems to be cross-posted? - https://www.biostars.org/p/486578/

ADD REPLY • link 3.9 years ago Kevin Blighe ★ 4.0k

score 1 · Answer 1 · 2021-01-26

I would do much of this a different way. Sean Davis might chime in with some information about why one might want to use a GPL object, but I can't personally remember ever thinking that I should do that. Instead I always use the GSE object (or use getGEOSuppFiles to get the underlying CEL files, which in this case would be what I would do, as these data are MAS5.0 and I would instead use RMA). But anyway

> eset <- getGEO("GSE2457")[[1]]
Found 1 file(s)
GSE2457_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE2nnn/GSE2457/matrix/GSE2457_series_matrix.txt.gz'
Content type 'application/x-gzip' length 394616 bytes (385 KB)
downloaded 385 KB


> eset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 15923 features, 10 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM46500 GSM46503 ... GSM46522 (10 total)
  varLabels: title geo_accession ... data_row_count (31 total)
  varMetadata: labelDescription
featureData
  featureNames: 1367452_at 1367453_at ... AFFX_ratb2/X14115_at (15923
    total)
  fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
  pubMedIds: 16118269 
Annotation: GPL341

You should note that this experiment was based on penile cavernosa tissue from rats and was run on RAE230A arrays, so there is no reason to be using an HG-U133_Plus2 annotation package. But you might want to re-annotate, because in general what you get from GEO for Affy arrays is just a parsed version of their annotation data, which are not as useful for this purpose as one might think. Being the author of the package, I would tend to use affycoretools, but there are obviously other things you could do.

> library(affycoretools)
> library(rae230a.db)
> eset <- annotateEset(eset, rae230a.db)
> head(fData(eset))
              PROBEID ENTREZID SYMBOL                                  GENENAME
1367452_at 1367452_at   690244  Sumo2           small ubiquitin-like modifier 2
1367453_at 1367453_at   114562  Cdc37 cell division cycle 37, HSP90 cochaperone
1367454_at 1367454_at    60384  Copb2          COPI coat complex subunit beta 2
1367455_at 1367455_at   116643    Vcp                valosin-containing protein
1367456_at 1367456_at    81920 Ube2d3        ubiquitin-conjugating enzyme E2D 3
1367457_at 1367457_at   114558  Becn1                                  beclin 1

And now if you want to average the duplicated probes (where by 'duplicated' I mean those with the same NCBI Gene ID, which IMO isn't necessarily duplicated).

> z <- avereps(eset, ID = fData(eset)$ENTREZID)
Warning messages:
1: In rowsum.default(x, ID, reorder = FALSE, na.rm = TRUE) :
  missing values for 'group'
2: In rowsum.default(1L - is.na(x), ID, reorder = FALSE) :
  missing values for 'group'

## that warning is just telling us that there are NA values for some of the NCBI Gene IDs, so we can silence it
## by subsetting (which we should do because we don't want the average of the NA probes).

> eset2 <- eset[!is.na(fData(eset)$ENTREZID),]
> zz <- avereps(eset2, fData(eset2)$ENTREZID)

And limma is happy to use the matrix zz, or you could stick back into an ExpressionSet if you like. Do note that the row.names of the matrix are now NCBI Gene IDs, so you can use that information to annotate things downstream.