Question: Can't find Ly76 on my Affymetrix MoGene 2.0ST array annotated with pd.mogene.2.0.st package
0
22 months ago by
felixwertek0 wrote:

Hello,

I am working on Affzmetrix MoGene 2.0 ST array data and would like to send the expression of the gene with MGI Sybmol "Ly76" (common name: Ter-119) to collaborators. However, I can not seem to find it on the entire array - is this possible, or does something go wrong with the annotation? On it's MGI Symbol page, it says that Ly76 has Feature Type "unclassified other genome feature", perhaps it indeed is not yet annotated?

Here is how I search the array for the gene:

librarypd.mogene.2.0.st)
library(oligo)
library(annotate)
library(mogene20sttranscriptcluster.db)
# celFiles is a character vector with file paths:
eset <- rma(affyRaw)
ID <- featureNames(eset)
Symbol <- getSYMBOL(ID, "mogene20sttranscriptcluster.db")
table(Symbol=="Ly76")

FALSE
24664

#for comparison, Ly75 is on the array:
table(Symbol=="Ly75")

FALSE  TRUE
24663     1

modified 22 months ago by James W. MacDonald50k • written 22 months ago by felixwertek0
Answer: Can't find Ly76 on my Affymetrix MoGene 2.0ST array annotated with pd.mogene.2.0
0
22 months ago by
United States
James W. MacDonald50k wrote:

It's not on the array. It is in the org.Mm.eg.db package, so it's a known gene but apparently Affy didn't tile it down.

Hi James,

thanks for the prompt answer, I've seen you very active in this forum in my role as visitor. It's feels good now to be a poster and directly get an answer, thanks for that and for your good advice!

Before I bring my collaborators the sad news, I hope to get a bit more familiar with the notion that Affymetrix really did not spot probes for this gene onto their array. You sound like you've checked org.Mm.eg.db, but do we know for sure that nothing goes wrong within the pd.mogene.2.0.st and mogene20sttranscriptcluster.db packages for Ly76?

Btw, nice explanation in what is the difference between "pd.mogene.2.0.st" and "mogene20sttranscriptcluster.db"? on how these packages work together with oligo, so again a word of praise: thanks for all your help!

All we provide with these packages is a convenient format for querying; all the data come from the Affymetrix annotation csv. Being the one who generates these things (and knowing myself relatively well), I am more than willing to admit that there might have been a mistake in parsing the data. So let's check!

Do note that the  pd.mogene.2.0.st package comes with the annotation csv, and we can load that up.

> library(pd.mogene.2.0.st)

## let's look for Ly76 directly

> grep("Ly76", pData(netaffxTranscript)[,"geneassignment"], value = TRUE)
character(0)
> grep("Ly76", pData(netaffxTranscript)[,"mrnaassignment"], value = TRUE)
character(0)

So no. But maybe they use a different name? Let's try Ter-119

> grep("[Tt][Ee][Rr]-119|[Tt][Ee][Rr]119", pData(netaffxTranscript)[,"geneassignment"], value = TRUE)
character(0)
> grep("[Tt][Ee][Rr]-119|[Tt][Ee][Rr]119", pData(netaffxTranscript)[,"mrnaassignment"], value = TRUE)
character(0)

It's Gene ID 104321, so let's try that. Affy uses a format they think is useful for input to databases, so we have to do some manipulation. Also note that just doing a grep on 104231 will pull up anything with that number in its name, so we will get stuff. You can run the following code to check.

sapply(strsplit(grep("104231", pData(netaffxTranscript)[,"geneassignment"], value = TRUE), " /// "), function(x) sapply(strsplit(x, " // "), "[", 2))

sapply(strsplit(grep("104231", pData(netaffxTranscript)[,"mrnaassignment"], value = TRUE), " /// "), function(x) sapply(strsplit(x, " // "), "[", 3))



If you do, you will see that we pick up two genes and four transcripts, none of which are Ly76. But this just means that Affy didn't actively try to tile down probes that will measure Ly76. Or they did, and haven't included it in the annotation file they provide. If you really want to dig deep, you could get the probe tab file and align it against the genome using whatever alignment method you prefer, and see if there really are any probes that could possibly bind to Ly76.

Just to add to the remark of James on aligning the probes to the genome: Manhong Dai from the MBNI group actually takes such an approach to generate the so-called custom CDFs. He recently did this to generate v22 of these CDFs. I quickly checked, and it turns out that according to these recent, Entrez-genese based probe remappings Ly76 is detected on the MoGene 2.0 ST arrays. See below.

So, if you are willing to move from the original, Affymetrix-defined array design, this could be an option. Check this post from James for a quick start C: How to use brainarray custom cdf with oligo package? (you will have to modify the name of the PdInfo and annotation libraries to work with the MoGene 2.0 array).

> install.packages("http://mbni.org/customcdf/22.0.0/entrezg.download/pd.mogene20st.mm.entrezg_22.0.0.tar.gz", type="source", repos=NULL)
> library(mogene20stmmentrezg.db)
>
> select(mogene20stmmentrezg.db, keys=c("Ly76"), columns=c("ENTREZID","SYMBOL","GENENAME"),keytype="SYMBOL")
'select()' returned 1:1 mapping between keys and columns
SYMBOL ENTREZID              GENENAME
1   Ly76   104231 lymphocyte antigen 76
>

Well, not really. Note that the ChipDb packages only have a single important table, and all other queries are farmed out to the org.Mm.eg.db package. If you do a query that includes a mapping from a probeset ID to anything else, then there will be a SQL join between the probes table in the ChipDb package and whatever other tables are required. But if you don't do a query that involves a probeset ID, the underlying query just does the lookup in org.Mm.eg.db!

> select(mogene20sttranscriptcluster.db, "Ly76", c("ENTREZID","SYMBOL"), "SYMBOL")
'select()' returned 1:1 mapping between keys and columns
SYMBOL ENTREZID
1   Ly76   104231
> select(mogene20sttranscriptcluster.db, "Ly76", c("ENTREZID","SYMBOL","PROBEID"), "SYMBOL")
[1] PROBEID  SYMBOL   ENTREZID
<0 rows> (or 0-length row.names)
> select(mogene20stmmentrezg.db, "Ly76", c("ENTREZID","SYMBOL"), "SYMBOL")
'select()' returned 1:1 mapping between keys and columns
SYMBOL ENTREZID
1   Ly76   104231
> select(mogene20stmmentrezg.db, "Ly76", c("ENTREZID","SYMBOL","PROBEID"), "SYMBOL")
[1] PROBEID  SYMBOL   ENTREZID
<0 rows> (or 0-length row.names)

And a direct query might be even more convincing?

> dbGetQuery(mogene20stmmentrezg_dbconn(), "select * from probes where gene_id='104231';")
[1] probe_id    gene_id     is_multiple
<0 rows> (or 0-length row.names)

Whoot... didn't know this. I stand corrected; thanks James! :)

Also thank you Guido, for tickling more info out of James ;)

Hi James,

I am amazed and enlightened, thank you for your committment and patience! You really helped and made me learn something new, which is much appreciated!

Felix

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.