Probeset ID's to entrez for Clariom D Human Affy Array
1
0
Entering edit mode
@jaymerickman-11439
Last seen 8.3 years ago

I am having trouble figuring out how to transform the probeset ID from the Affy given name to an entrez ID (heck, I would even settle for gene symbols).  My session info is listed at the bottom of this post.  I run into this problem whether utilizing the custom CDF that was given to me by Affymetrix, or utilizing pdInfoMaker as was described C: Clariom D Human Microarray CDF file to package.  Just as there is no CDF publicly available, there is no annotation .db object available.

 I have tried the 'AffyCompatable' package to fetch the NetAffx resource, however, it returns a data frame with all the information jumbled into a single column.  

I have also tried utilizing the .db for the predecessor, just to be able to match the Affy_id's to the entrez Id's but that returns an error.

 

> library(hta20sttranscriptcluster.db)
Loading required package: org.Hs.eg.db

> affyid <- rownames(eset)
> egids2 <- hta20sttranscriptclusterENTREZID[affyid]
Error in .checkKeys(value, Lkeys(x), x@ifnotfound) : 
  value for "AFFX-BkGr-GC03_st" not found

My session info.  Can you tell I have been trying a ton of different things?

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] hta20sttranscriptcluster.db_8.3.1 org.Hs.eg.db_3.3.0               
 [3] AnnotationDbi_1.35.4              AffyCompatible_1.33.0            
 [5] RCurl_1.95-4.8                    bitops_1.0-6                     
 [7] XML_3.98-1.4                      biomaRt_2.29.2                   
 [9] BiocInstaller_1.23.9              genefilter_1.55.2                
[11] pd.clariom.d.human_0.0.1          RSQLite_1.0.0                    
[13] DBI_0.5-1                         oligo_1.37.2                     
[15] Biostrings_2.41.4                 XVector_0.13.7                   
[17] IRanges_2.7.15                    S4Vectors_0.11.14                
[19] oligoClasses_1.35.0               limma_3.29.21                    
[21] Biobase_2.33.3                    BiocGenerics_0.19.2              

loaded via a namespace (and not attached):
 [1] GenomeInfoDb_1.9.10         iterators_1.0.8            
 [3] tools_3.3.1                 zlibbioc_1.19.0            
 [5] bit_1.1-12                  annotate_1.51.0            
 [7] preprocessCore_1.35.0       lattice_0.20-34            
 [9] ff_2.2-13                   Matrix_1.2-7.1             
[11] foreach_1.4.3               affxparser_1.45.0          
[13] grid_3.3.1                  survival_2.39-5            
[15] codetools_0.2-14            GenomicRanges_1.25.94      
[17] splines_3.3.1               SummarizedExperiment_1.3.82
[19] xtable_1.8-2                affyio_1.43.0              

 
microarray affy • 6.7k views
ADD COMMENT
0
Entering edit mode

Hey Jayme, Let me know if you figured out how to annotate the Clariom D data. I have tried a method mentioned here using Bioconductor package "clariomdhumantranscriptcluster.db" package but that doesn't seem to work for me.

Thanks

 

ADD REPLY
1
Entering edit mode

Did you read the this full thread? Also the comments of James below?

"The easiest thing to do is to use annotateEset in my affycoretools package".

Some relevant points:

- you can summarize the probes on the levels of known transcripts (= core; default), or on the level of probe set regions, that are intended to measure portions of an exon, or exon-exon junctions. See e.g. James' remarks for much more info on this: C: Transcript to gene in clariom d human affymetrix data.

- you can use the annotation provided by Affymetrix (as available on their NetAffx site), which is contained in the Platform Design (pd) package pd.clariom.d.human.

- you can also use the annotation that is assembled by the BioC core team using data from public repositories, either on the levels of probe set regions (clariomdhumanprobeset.db) or transcripts (clariomdhumantranscriptcluster.db).

 

Some example code:

library(oligo)
library(affycoretools)
library(pd.clariom.d.human)
library(clariomdhumanprobeset.db)
library(clariomdhumantranscriptcluster.db)

dat <- read.celfiles(list.celfiles())

transcript.eset=rma(dat,target="core") #default level of summerization
probeset.eset=rma(dat,target="probeset")

# annotating using Affymetrix-provided information (NetAffx)
transcript.eset.affy <- annotateEset(transcript.eset, pd.clariom.d.human)

# annotating using info assembled by the Bioconductor Core Team
transcript.eset <- annotateEset(transcript.eset, clariomdhumantranscriptcluster.db)
probeset.eset <- annotateEset(probeset.eset, clariomdhumanprobeset.db)

 

An excerpt of the annotated transcript-level data (notice the subtle differences...):

> fData(transcript.eset.affy)[c(50792,50803,50980),]
                            PROBEID              ID    SYMBOL                                  GENENAME
TC0600011336.hg.1 TC0600011336.hg.1       NR_001317     HCG4B HLA complex group 4B (non-protein coding)
TC0600011347.hg.1 TC0600011347.hg.1 ENST00000376797 ZNRD1-AS1                     ZNRD1 antisense RNA 1
TC0600011534.hg.1 TC0600011534.hg.1    NM_001164267     WDR46                       WD repeat domain 46
> fData(transcript.eset)[c(50792,50803,50980),]
                            PROBEID ENTREZID   SYMBOL                                              GENENAME
TC0600011336.hg.1 TC0600011336.hg.1    80868    HCG4B             HLA complex group 4B (non-protein coding)
TC0600011347.hg.1 TC0600011347.hg.1    80862 ZNRD1ASP zinc ribbon domain containing 1 antisense, pseudogene
TC0600011534.hg.1 TC0600011534.hg.1     9277    WDR46                                   WD repeat domain 46
>
ADD REPLY
1
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

The easiest thing to do for now is to use annotateEset in my affycoretools package. Something like

library(oligo)

library(affycoretools)

dat <- read.celfiles(list.celfiles())

eset <- rma(dat)

eset <- annotateEset(eset, pd.clariom.d.human)

head(fData(eset))

 

ADD COMMENT
0
Entering edit mode

Hi James, when I try that method of annotation it just maintains the Affy id's that are provided.  I am trying to get these to entrez id's or gene symbol.  further when I print the head I get 'na' for the additional values

> eset <- annotateEset(celfiles.rma, pd.clariom.d.human)
> head(fData(eset))
                            PROBEID   ID SYMBOL GENENAME
AFFX-BkGr-GC03_st AFFX-BkGr-GC03_st <NA>   <NA>     <NA>
AFFX-BkGr-GC04_st AFFX-BkGr-GC04_st <NA>   <NA>     <NA>
AFFX-BkGr-GC05_st AFFX-BkGr-GC05_st <NA>   <NA>     <NA>
AFFX-BkGr-GC06_st AFFX-BkGr-GC06_st <NA>   <NA>     <NA>
AFFX-BkGr-GC07_st AFFX-BkGr-GC07_st <NA>   <NA>     <NA>
AFFX-BkGr-GC08_st AFFX-BkGr-GC08_st <NA>   <NA>     <NA>​
ADD REPLY
0
Entering edit mode

The probesets you are showing there are the Affymetrix background probesets. By definition a background probeset won't have an Entrez Gene ID, symbol or gene name, because it's not a gene! It's a background probeset that isn't supposed to bind to any known gene. In that situation do you expect to have anything other than an NA?

ADD REPLY
0
Entering edit mode

 

 

I see.  But RMA via. the oligo package is supposed to remove the background genes (and background noise).  

I am not sure if I have potentially done something incorrect up to this point.  as it has been running with out any error.  For reference this is the general workflow so far.  I am admittedly very new to this, and on my first real project have been thrown for a loop by the laboratory utilizing a brand new chip.  

##Load Oligo, and pd.clariom.d.human:: packages.##

library(oligo)
library(pd.clariom.d.human)

##Import Celfiles into oligo to create expression set and get probe genes##
##From Ovarian FPN
setwd("~/Documents/Internship/OvarianFPN")
celfiles <- list.files("data/", pattern = "CEL")
## from CEL file location
setwd("~/Documents/Internship/OvarianFPN/data")
rawData <- read.celfiles(celfiles, pkgname = "pd.clariom.d.human")

## normalize the data with RMA
celfiles.rma <- rma(rawData)

## Filter the Data
library(genefilter)
celfiles.f.light <- nsFilter(celfiles.rma, require.entrez = FALSE, remove.dupEntrez = FALSE, var.filter = FALSE)

#Set Experimental Design
library(limma)
design <- model.matrix(~0+as.factor(c(rep(1,3), rep(0,3))))
colnames(design) <- c("cancer", "ferroportin")
contrast.matrix <- makeContrasts(ferroportin_v_cancer=ferroportin-cancer, levels = design)

##Limma to fit the linear model
fit <- lmFit(exprs(celfiles.f.light$eset), design)
fe_fits <- contrasts.fit(fit, contrast.matrix)
fe_ebFit <- eBayes(fe_fits)

##enterized gene id

## print table of top differentially expressed genes
TopDE <- topTable(fe_ebFit, number = 25)
TopDE

You will notice the area where I was originally going to get the entrezID is left blank since I haven't figured out how to do that as of yet.

ADD REPLY
0
Entering edit mode

"But RMA via. the oligo package is supposed to remove the background genes (and background noise)."

I don't know where you got this idea, but it is not correct. RMA doesn't know anything about which probesets are background, and doesn't remove any probesets. You are correct that RMA adjusts for background binding, but no probesets are removed.

## Filter the Data
library(genefilter)
celfiles.f.light <- nsFilter(celfiles.rma, require.entrez = FALSE, remove.dupEntrez = FALSE, var.filter = FALSE)

All the above code should do is remove the AFFX probesets, which are just a handful, and do not include all the various background probes on this array.

You could instead do

celfiles.rma <- annotateEset(celfiles.rma, pd.clariom.d.human)

celfiles.filtered <- getMainProbes(celfiles.rma)

Which will subset to just the 'main' probesets, removing all the extra cruft. Then when you run things through limma and get the topTable results, your data will already be annotated.

 

ADD REPLY
0
Entering edit mode

So I am definitely doing something wrong.  I does add the feature set data after the first line of code (where I annotate the 'celfiles.rma' but when I utilize 'getMainProbes' I am returned a list with no features (as in it is saying all of my probes are background probes).

 

> celfiles.rma <- annotateEset(celfiles.rma, pd.clariom.d.human)
> celfiles.filtered <- getMainProbes(celfiles.rma)
> celfiles.filtered
ExpressionSet (storageMode: lockedEnvironment)
assayData: 0 features, 6 samples 
  element names: exprs 
protocolData
  rowNames: C-tet_Dox-1_(Clariom_D_Human).CEL.html
    C-tet_Dox-2_(Clariom_D_Human).CEL.html ...
    C-tet-7PU+Dox-3_(Clariom_D_Human).CEL.html (6 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: C-tet_Dox-1_(Clariom_D_Human).CEL.html
    C-tet_Dox-2_(Clariom_D_Human).CEL.html ...
    C-tet-7PU+Dox-3_(Clariom_D_Human).CEL.html (6 total)
  varLabels: index
  varMetadata: labelDescription channel
featureData
  featureNames:
  fvarLabels: PROBEID ID SYMBOL GENENAME
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation: pd.clariom.d.human 

ADD REPLY
0
Entering edit mode

I don't think you are doing anything wrong, aside from using these stupid new Clariom arrays. There are obviously some infelicities with these new arrays that will require some modifications to get around. Unfortunately I don't have time right now to dig into the details, so you are going to have to muddle along as best you can.

ADD REPLY
0
Entering edit mode

Thank you for your help any way!  I have also spoken to my internship advisor. Unfortunately she is at a conference this week so her time to help is limited.  If I somehow stumble upon an awnswer I will post it here incase someone else has a problem with these arrays as well

ADD REPLY

Login before adding your answer.

Traffic: 438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6