Question

hugene10sttranscriptcluster.db missing some genes?

0

Entering edit mode

colonppg ▴ 30

@colonppg-7771

Last seen 7.1 years ago

United States

Dear all:

I have a project on hugene st 1.0 v1

I got all the probesets ID and give it to idsi, then try to get all entrezid and genesymbolls

idsi<-probe.gs$PROBEID
annot<-select(hugene10sttranscriptcluster.db, as.character(idsi), c("ENTREZID", "SYMBOL"), "PROBEID")

Got error message:

Warning message:
In .generateExtraRows(tab, keys, jointype) :
  'select' resulted in 1:many mapping between keys and return rows

and it is weird some of the genes is apparently missing from the result data frame....

do not understand why, anyone had the same issue?

Thanks

hugene10sttranscriptcluster.db annotation • 1.1k views

ADD COMMENT • link updated 8.9 years ago by James W. MacDonald 65k • written 8.9 years ago by colonppg ▴ 30

score 0 · Answer 1 · 2015-06-02

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

That's not an error. It's a warning. And what it says is that there are multiple one-to-many mappings of probeset IDs to either Entrez Gene IDs or Hugo Symbols.

> z <- select(hugene10sttranscriptcluster.db, keys(hugene10sttranscriptcluster.db), c("SYMBOL","ENTREZID"))
> head(z[duplicated(z[,1]),])
     PROBEID       SYMBOL  ENTREZID
4205 7896740        OR4F4     26682
4206 7896740        OR4F5     79501
4208 7896742 LOC100134822 100134822
4209 7896742       PCMTD2     55251
4210 7896742  LINC00266-1    140849
4211 7896742 LOC101059936 101059936

ADD COMMENT • link 8.9 years ago James W. MacDonald 65k

0

Entering edit mode

Dear James:

Thanks for your response, I do not think the warning will be an issue, but apparently this package is buggy because it misses a lot genes -- I downloaded the annotation from Affy and processed them under unix, those genes are there...

I wonder if anyone encountered such issue and has a work around...

thanks

ADD REPLY • link 8.9 years ago colonppg ▴ 30

0

Entering edit mode

The package isn't buggy - it reports exactly what we get from Affy. There are caveats however.

We base annotations on RefSeq/GenBank and Entrez Gene IDs. Any transcript that isn't in one of those databases is invisible to the process. This could/should/might change, but for now there it is.
The current annotation packages are based on the na34 annotation files that were current when we released. Affy has since admitted to a co-worker of mine that (at least for the HuEx 1.0 annotations) there are problems with these files, and have released the na35 (and just two days ago some na35.1) versions. I am in the process of re-building the annotation packages, and hypothetically the newer versions will be better in some substantive way.

ADD REPLY • link 8.9 years ago James W. MacDonald 65k

0

Entering edit mode

hi, James:

thanks, that's explains it:

mydata<-frma(affyobj, target="core")

I think the "core" made a lot of genes missing... thanks for your explanation.

great help...

ADD REPLY • link 8.9 years ago colonppg ▴ 30

0

Entering edit mode

Using 'core' shouldn't make a lot of genes missing. This simply summarizes the probesets at the transcript level. This is different from Affy's concept of core probesets for the Exon arrays, where the core probesets are those with most evidence to actually exist. The Gene arrays only have (what Affy calls) core probesets to begin with (and for those there are often individual probes dropped for various reasons), so for oligo and frma core == transcript.

ADD REPLY • link 8.9 years ago James W. MacDonald 65k