Entering edit mode
chawla
▴
190
@chawla-4416
Last seen 10.3 years ago
Hi
I am trying to find GO terms (go biological process Ids) for a set of
2195 unique "affy_hg_u133a_2" probe ids.
>goterms=getBM(attributes = c("affy_hg_u133a_2",
"go_biological_process_id","entrezgene"), filters = "affy_hg_u133a_2",
values = data[,1], mart = ensembl)
> head(goterms)
affy_hg_u133a_2 go_biological_process_id entrezgene
1 209891_at GO:0051301 57405
2 209891_at GO:0007052 57405
3 209891_at GO:0007059 57405
4 209891_at GO:0007049 57405
5 209891_at GO:0007067 57405
6 206204_at GO:0007165 2888
> dim(goterms)
[1] 15088 3
> length(unique(goterms[,1]))
[1] 1875
> length(which(goterms[,2]==""))
[1] 1222
My question is if out of 2195 unique probe ids, 1875 genes have the go
terms for biological process id and are present in the result, but
then
why 1222 rows have "" as biological process id.
They should simply be absent from the result, is something wrong ?
if not I will have to filter them each time I use Biomart for Go
terms
extraction.
The same problem occurred with yeast and rat data.
Thanks in advance
Konika