Question: matching probes to genes from pd.porgene.1.1.st array
0
gravatar for serpalma.v
3.8 years ago by
serpalma.v50
Germany
serpalma.v50 wrote:

Dear members:

I am struggling to find the gene names for the probes found in the above mentioned microarray chip. I have search in the available annotation packages available in bioconductor, but the probe names are not present in them.

I have also tried wieh biomart but "pd.porgene.1.1.st" is not a valid attribute.

Has anyone done this for this specific chip that could point out what I am missing?

 

Kind regards

 

ADD COMMENTlink modified 3.8 years ago by James W. MacDonald51k • written 3.8 years ago by serpalma.v50
Answer: matching probes to genes from pd.porgene.1.1.st array
2
gravatar for James W. MacDonald
3.8 years ago by
United States
James W. MacDonald51k wrote:

One thing people might not realize is that the pdInfo packages for the Gene ST arrays also contain a parsed version of Affy's transcript and probeset csv file, which can be used as well. This is sort of a pain because it's not that easy to parse out the relevant information, and since the order of the probesets in the csv is different from the ExpressionSet of summarized data, you have to remember to reorder (which to my dismay I forgot in a recent analysis... Tsk tsk).

Anyway, since I am apparently incapable of remembering such things, I added some functionality to my affycoretools package to automate this, and ensure that I don't bork it up like usual. This is in the devel version, so you need a devel installation of R/BioC. But it's a one-liner. Say you do something like this:

dat <- read.celfiles(filenames = list.celfiles())
eset <- rma(dat)

So now you have an ExpressionSet called 'eset', which has 'pd.porgene.1.1.st' in its annotation slot. You can now annotate the ExpressionSet like this:

eset.annot <- annotateEset(eset, annotation(eset))

Which will then parse out the information from the Affy csv file that comes with the pd.porgene.1.1.st package. The nice thing about this is that the limma package will use this annotation data by default, so if you use limma to make comparisons, you don't have to do anything else to include the annotation data.

If you follow Guido's suggestion and use the MBNI packages, you can install the corresponding ChipDb package from MBNI rather than using the OrgDb package directly, in which case you can then do (assuming now that your ExpressionSet is based on the MBNI package instead):

library(porgene11stssentrezg.db)
eset.annot <- annotateEset(eset, porgene11stssentrezg.db, columns = c("ENTREZID","SYMBOL","GENENAME","ALIAS"))

And you will get the same exact annotation that Guido got, with less work on your part, a double-check that your data and annotations are in correct 1:1 correspondence, and as above these annotations will propagate down through limma.

ADD COMMENTlink written 3.8 years ago by James W. MacDonald51k

Hi James, wow that are some nice additions that indeed make annotating much easier! Good to know about this. Thanks!

ADD REPLYlink written 3.8 years ago by Guido Hooiveld2.5k
Answer: matching probes to genes from pd.porgene.1.1.st array
0
gravatar for serpalma.v
3.8 years ago by
serpalma.v50
Germany
serpalma.v50 wrote:

I have tried the following without success. I am starting to think the probe names are not recognized:

mart <- useMart("ENSEMBL_MART_ENSEMBL") # select database

i <- grep("Sus scrofa", listDatasets(mart)$description) # search for data set for porcine
listDatasets(mart)[i,]

ensembl <- useDataset("sscrofa_gene_ensembl", mart) # select data set for porcine

i <- grep("affy",listAttributes(ensembl)$name) # search for attributes for affymetrix chip
length(i)
listAttributes(ensembl)[i,] 

values <- c("15180001","15180003","15180005") # select some random probes

getBM(attributes = c("ensembl_gene_id",  #make the query
                     "affy_porcine"),
      filters = "affy_porcine",  
      values = values, mart = ensembl)

This returns 0 results....

 

 

 

 

 

 

ADD COMMENTlink written 3.8 years ago by serpalma.v50

This is expected because the filter attribute "affy_porcine" refers to the "old" Affymetrix GeneChip Porcine Genome Array http://www.affymetrix.com/catalog/131488/AFFY/Porcine-Genome-Array

 

AFAIK the probes of the newer GeneST arrays have not been incoporated by ENSEMBL.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by Guido Hooiveld2.5k
Answer: matching probes to genes from pd.porgene.1.1.st array
0
gravatar for Guido Hooiveld
3.8 years ago by
Guido Hooiveld2.5k
Wageningen University, Wageningen, the Netherlands
Guido Hooiveld2.5k wrote:

The same question has been asked before: I would like to refer you to the excellent answer of James (MacDonald), that shows you how to generate an annotation package for this array (thus using the original design defined by Affymetrix): A: Annotation of Affy Porcine Gene 1.0 ST array data

 

An an alternative you can also consider making use of a so-called custom (remapped) chip definitions (CDF) for this array. Manhong Dai and Fan Meng at the MBNI provide for this array a Probe Design Info database (for use with oligo) [and a CDF] that are based on very recent annotation info present in e.g. the EntrezGene or ENSEMBL-databases.

http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp#v20 ; see column labelled "O", thus e.g. pd.porgene11st.ss.entrezg_20.0.0.tar.gz.

 

If you decide to go with the 'custom CDF' option, you may do this (provided all required libraries are installed):

Note: I am doing this using some of the PorGeneST11 arrays we have ran, and I am using the ENTREZ-based remappings (on my Win7 machine)

> library(oligo)
> library(org.Ss.eg.db)
> library(stringr)
>

> install.packages("http://mbni.org/customcdf/20.0.0/entrezg.download/pd.porgene11st.ss.entrezg_20.0.0.tar.gz")
inferring 'repos = NULL' from 'pkgs'
trying URL 'http://mbni.org/customcdf/20.0.0/entrezg.download/pd.porgene11st.ss.entrezg_20.0.0.tar.gz'
Content type 'application/x-gzip' length 7345616 bytes (7.0 MB)
downloaded 7.0 MB

* installing *source* package 'pd.porgene11st.ss.entrezg' ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (pd.porgene11st.ss.entrezg)
>

>
> affy.data <- read.celfiles(filenames = list.celfiles(), pkgname = "pd.porgene11st.ss.entrezg")
Platform design info loaded.
Reading in : L.CEL
Reading in : T.CEL

<<snip>>

> geneSummaries <- rma(affy.data)
Background correcting... OK
Normalizing... OK
Calculating Expression
>
> #head(exprs(geneSummaries))
>
> MyGenes <-  rownames(head(exprs(geneSummaries)))

# Add annotation to dataset based on custom CDF
# Note 1: ProbesetIDs are basically EntrezGeneIDs with "_at" attached to it, thus to obtain EntrezID you have to get rid of the "_at"
# Note 2: use columns(org.Ss.eg.db) to find out which annotation data can be retrieved
# Note 3: use keytypes(org.Ss.eg.db) to find out hich identifiers can be queried with

> keytypes(org.Ss.eg.db)
 [1] "ACCNUM"      "ALIAS"       "ENTREZID"    "ENZYME"      "EVIDENCE"   
 [6] "EVIDENCEALL" "GENENAME"    "GO"          "GOALL"       "ONTOLOGY"   
[11] "ONTOLOGYALL" "PATH"        "PMID"        "REFSEQ"      "SYMBOL"     
[16] "UNIGENE"     "UNIPROT"    
>
> MyGenes <- str_replace(string=MyGenes, pattern="_at", replacement="")
>
>
> anno.result <- select(org.Ss.eg.db, keys=MyGenes, columns=c("ENTREZID","SYMBOL","GENENAME","ALIAS"),keytype="ENTREZID")
'select()' returned 1:many mapping between keys and columns
> anno.result <- anno.result[!duplicated(anno.result[,1]),] # Get rid of duplicates; only keep 1st hit (= arbitrary decision!)
> head(anno.result)
    ENTREZID  SYMBOL
1  100034246  PIK3R6
2  100037269 SLC27A1
5  100037270   DESI2
11 100037271   ENPP1
14 100037272   EPAS1
17 100037273   MEF2A
                                                      GENENAME   ALIAS
1              phosphoinositide-3-kinase, regulatory subunit 6  PIK3R6
2  solute carrier family 27 (fatty acid transporter), member 1  ACSVL5
5                                 desumoylating isopeptidase 2 CGI-146
11          ectonucleotide pyrophosphatase/phosphodiesterase 1    NPP1
14                            endothelial PAS domain protein 1    EPAS
17                                  myocyte enhancer factor 2A   MEF2A
>
>
>
>
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Guido Hooiveld2.5k

Wich R version are you working on?

the package :

install.packages("http://mbni.org/customcdf/20.0.0/entrezg.download/pd.porgene11st.ss.entrezg_20.0.0.tar.gz")

It is not compatible with R 3.3.1, nor 3.2, nor 3.2.2

 

 

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by serpalma.v50

Mmm, (still) working for me.... I am working on Win7, and also have RTools installed (in addition to R).

 

>
> install.packages("http://mbni.org/customcdf/20.0.0/entrezg.download/pd.porgene11st.ss.entrezg_20.0.0.tar.gz")
inferring 'repos = NULL' from 'pkgs'
trying URL 'http://mbni.org/customcdf/20.0.0/entrezg.download/pd.porgene11st.ss.entrezg_20.0.0.tar.gz'
Content type 'application/x-gzip' length 7345616 bytes (7.0 MB)
downloaded 7.0 MB

* installing *source* package 'pd.porgene11st.ss.entrezg' ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (pd.porgene11st.ss.entrezg)
> sessionInfo()
R version 3.3.1 Patched (2016-06-28 r70853)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
>
ADD REPLYlink written 2.9 years ago by Guido Hooiveld2.5k

And which is your R version? I am working in windows 10 and also have installed Rtools 3.4

 

This is the output I get when trying to install the package and my session info:

> install.packages("http://mbni.org/customcdf/20.0.0/entrezg.download/pd.porgene11st.ss.entrezg_20.0.0.tar.gz")
Installing package into ‘C:/Users/sergio/Documents/R/win-library/3.3’
(as ‘lib’ is unspecified)
Warning in install.packages :
  package ‘http://mbni.org/customcdf/20.0.0/entrezg.download/pd.porgene11st.ss.entrezg_20.0.0.tar.gz’ is not available (for R version 3.3.1)
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.3.1

 

ADD REPLYlink written 2.9 years ago by serpalma.v50

Well, as you could have seen above, I am running:

R version 3.3.1 Patched (2016-06-28 r70853)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Rtools version = 3.4.0.1962

 

... which basically is the same setup you are using, except for Win7 vs Win10.

Did you start/run R as 'administrator' when installing the package? I always do this, so the library is installed in the R installation directory, and not in a personal directory (as in your case). This may be important.

To be honest, since the exact same line above (copy/paste) works on my system, I have no other clues on what causes this problem for you. Sorry. Hopefully someone else can provide more advice....

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Guido Hooiveld2.5k

try adding the argument repos=NULL to install.packages().

ADD REPLYlink written 2.9 years ago by Martin Morgan ♦♦ 23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 224 users visited in the last hour