Question

matching probes to genes from pd.porgene.1.1.st array

0

Entering edit mode

serpalma.v ▴ 60

@serpalmav-8912

Last seen 2.3 years ago

Germany

Dear members:

I am struggling to find the gene names for the probes found in the above mentioned microarray chip. I have search in the available annotation packages available in bioconductor, but the probe names are not present in them.

I have also tried wieh biomart but "pd.porgene.1.1.st" is not a valid attribute.

Has anyone done this for this specific chip that could point out what I am missing?

Kind regards

pd.porgene.1.1.st affymetrix microarrays • 3.1k views

ADD COMMENT • link updated 8.4 years ago by James W. MacDonald 65k • written 8.4 years ago by serpalma.v ▴ 60

score 2 · Answer 1 · 2015-12-09

One thing people might not realize is that the pdInfo packages for the Gene ST arrays also contain a parsed version of Affy's transcript and probeset csv file, which can be used as well. This is sort of a pain because it's not that easy to parse out the relevant information, and since the order of the probesets in the csv is different from the ExpressionSet of summarized data, you have to remember to reorder (which to my dismay I forgot in a recent analysis... Tsk tsk).

Anyway, since I am apparently incapable of remembering such things, I added some functionality to my affycoretools package to automate this, and ensure that I don't bork it up like usual. This is in the devel version, so you need a devel installation of R/BioC. But it's a one-liner. Say you do something like this:

dat <- read.celfiles(filenames = list.celfiles())
eset <- rma(dat)

So now you have an ExpressionSet called 'eset', which has 'pd.porgene.1.1.st' in its annotation slot. You can now annotate the ExpressionSet like this:

eset.annot <- annotateEset(eset, annotation(eset))

Which will then parse out the information from the Affy csv file that comes with the pd.porgene.1.1.st package. The nice thing about this is that the limma package will use this annotation data by default, so if you use limma to make comparisons, you don't have to do anything else to include the annotation data.

If you follow Guido's suggestion and use the MBNI packages, you can install the corresponding ChipDb package from MBNI rather than using the OrgDb package directly, in which case you can then do (assuming now that your ExpressionSet is based on the MBNI package instead):

library(porgene11stssentrezg.db)
eset.annot <- annotateEset(eset, porgene11stssentrezg.db, columns = c("ENTREZID","SYMBOL","GENENAME","ALIAS"))

And you will get the same exact annotation that Guido got, with less work on your part, a double-check that your data and annotations are in correct 1:1 correspondence, and as above these annotations will propagate down through limma.

score 0 · Answer 2 · 2015-12-09

I have tried the following without success. I am starting to think the probe names are not recognized:

mart <- useMart("ENSEMBL_MART_ENSEMBL") # select database

i <- grep("Sus scrofa", listDatasets(mart)$description) # search for data set for porcine
listDatasets(mart)[i,]

ensembl <- useDataset("sscrofa_gene_ensembl", mart) # select data set for porcine

i <- grep("affy",listAttributes(ensembl)$name) # search for attributes for affymetrix chip
length(i)
listAttributes(ensembl)[i,]

values <- c("15180001","15180003","15180005") # select some random probes

getBM(attributes = c("ensembl_gene_id", #make the query
"affy_porcine"),
filters = "affy_porcine",
values = values, mart = ensembl)

This returns 0 results....

score 0 · Answer 3 · 2015-12-09

The same question has been asked before: I would like to refer you to the excellent answer of James (MacDonald), that shows you how to generate an annotation package for this array (thus using the original design defined by Affymetrix): A: Annotation of Affy Porcine Gene 1.0 ST array data

An an alternative you can also consider making use of a so-called custom (remapped) chip definitions (CDF) for this array. Manhong Dai and Fan Meng at the MBNI provide for this array a Probe Design Info database (for use with oligo) [and a CDF] that are based on very recent annotation info present in e.g. the EntrezGene or ENSEMBL-databases.

http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp#v20 ; see column labelled "O", thus e.g. pd.porgene11st.ss.entrezg_20.0.0.tar.gz.

If you decide to go with the 'custom CDF' option, you may do this (provided all required libraries are installed):

Note: I am doing this using some of the PorGeneST11 arrays we have ran, and I am using the ENTREZ-based remappings (on my Win7 machine)

> library(oligo)
> library(org.Ss.eg.db)
> library(stringr)
>

> install.packages("http://mbni.org/customcdf/20.0.0/entrezg.download/pd.porgene11st.ss.entrezg_20.0.0.tar.gz")
inferring 'repos = NULL' from 'pkgs'
trying URL 'http://mbni.org/customcdf/20.0.0/entrezg.download/pd.porgene11st.ss.entrezg_20.0.0.tar.gz'
Content type 'application/x-gzip' length 7345616 bytes (7.0 MB)
downloaded 7.0 MB

* installing *source* package 'pd.porgene11st.ss.entrezg' ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (pd.porgene11st.ss.entrezg)
>

>
> affy.data <- read.celfiles(filenames = list.celfiles(), pkgname = "pd.porgene11st.ss.entrezg")
Platform design info loaded.
Reading in : L.CEL
Reading in : T.CEL

<<snip>>

> geneSummaries <- rma(affy.data)
Background correcting... OK
Normalizing... OK
Calculating Expression
>
> #head(exprs(geneSummaries))
>
> MyGenes <-  rownames(head(exprs(geneSummaries)))

# Add annotation to dataset based on custom CDF
# Note 1: ProbesetIDs are basically EntrezGeneIDs with "_at" attached to it, thus to obtain EntrezID you have to get rid of the "_at"
# Note 2: use columns(org.Ss.eg.db) to find out which annotation data can be retrieved
# Note 3: use keytypes(org.Ss.eg.db) to find out hich identifiers can be queried with

> keytypes(org.Ss.eg.db)
 [1] "ACCNUM"      "ALIAS"       "ENTREZID"    "ENZYME"      "EVIDENCE"   
 [6] "EVIDENCEALL" "GENENAME"    "GO"          "GOALL"       "ONTOLOGY"   
[11] "ONTOLOGYALL" "PATH"        "PMID"        "REFSEQ"      "SYMBOL"     
[16] "UNIGENE"     "UNIPROT"    
>
> MyGenes <- str_replace(string=MyGenes, pattern="_at", replacement="")
>
>
> anno.result <- select(org.Ss.eg.db, keys=MyGenes, columns=c("ENTREZID","SYMBOL","GENENAME","ALIAS"),keytype="ENTREZID")
'select()' returned 1:many mapping between keys and columns
> anno.result <- anno.result[!duplicated(anno.result[,1]),] # Get rid of duplicates; only keep 1st hit (= arbitrary decision!)
> head(anno.result)
    ENTREZID  SYMBOL
1  100034246  PIK3R6
2  100037269 SLC27A1
5  100037270   DESI2
11 100037271   ENPP1
14 100037272   EPAS1
17 100037273   MEF2A
                                                      GENENAME   ALIAS
1              phosphoinositide-3-kinase, regulatory subunit 6  PIK3R6
2  solute carrier family 27 (fatty acid transporter), member 1  ACSVL5
5                                 desumoylating isopeptidase 2 CGI-146
11          ectonucleotide pyrophosphatase/phosphodiesterase 1    NPP1
14                            endothelial PAS domain protein 1    EPAS
17                                  myocyte enhancer factor 2A   MEF2A
>
>
>
>