What annotation package?
1
0
Entering edit mode
Ed Siefker ▴ 230
@ed-siefker-5136
Last seen 5 months ago
United States

How do I figure out what annotation package to use with a transcriptome?

I have reads counted with rna.fa.gz from:
ftp://ftp.ncbi.nih.gov/genomes/M_musculus/ARCHIVE/BUILD.37.1/RNA/

I can't find an NCBI annotation package for Mm, or any Mm annotation package that references build 37. 
Names look like this:

> head(data$Name)
[1] "gi|126352347|ref|NM_028260.2|" "gi|142348699|ref|NM_010886.2|"
[3] "gi|23821034|ref|NM_017376.2|"  "gi|33239323|ref|NM_146675.1|"
[5] "gi|118130860|ref|NM_133839.2|" "gi|29789242|ref|NM_024451.1|"

I thought that the numbers following 'gi|' would be entrez gene identifiers. 
So I tried org.Mm.eg.db. But this doesn't seem to work:

> x <- org.Mm.egGENENAME
> mapped_genes <- mappedkeys(x)
> which(mapped_genes=="126352347")
integer(0)
> which(mapped_genes==126352347)
integer(0)

I'm confused and don't know what to try. 

annotation • 861 views
ADD COMMENT
0
Entering edit mode

I figure the IDs after "ref|" are REFSEQ? 

> head(as.list(org.Mm.egREFSEQ), n=2)
$`11287`
[1] "NM_007376" "NP_031402"

$`11298`
[1] "NM_009591"    "NP_033721"    "NR_033223"    "XM_017314223" "XM_017314224"
[6] "XP_017169712" "XP_017169713"


Looks like it.  So I need to chop up data$Name to get the REFSEQ out.
 

> IDs <- strsplit(data$Name, "\\|")
my.REFSEQ <- gsub("\\.*","",my.REFSEQ)
head(my.REFSEQ)> my.REFSEQ <- unlist(lapply(IDs, '[[', 4))
> my.REFSEQ <- gsub("\\.*","",my.REFSEQ)
> head(my.REFSEQ)
[1] "NM_0282602" "NM_0108862" "NM_0173762" "NM_1466751" "NM_1338392"
[6] "NM_0244511"

Now that I have REFSEQs, I can use select() to extract SYMBOLs right?

> select(org.Mm.eg.db, keys = my.REFSEQ, keytype = "REFSEQ", columns = "SYMBOL")
Error in .testForValidKeys(x, keys, keytype, fks) :
  None of the keys entered are valid keys for 'REFSEQ'. Please use the keys method to see a listing of valid arguments.

Apparently not.  What am I doing wrong?

ADD REPLY
0
Entering edit mode

Figured it out.  My gsub is borked. 

my.REFSEQ <- gsub("\\..*","",my.REFSEQ)

Anyway, is there a way to do this without the strsplit and gsub? Do any of the annotation packages parse IDs like "gi|126352347|ref|NM_028260.2|" ?

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 11 hours ago
United States

If you answer your own question, people will assume it's been answered and will then ignore it. So unless you want that to happen, you should be commenting, not answering.

As to your remaining question, I don't know of any function that exists to do that. I just roll my own, as it's simple and often specific.

ADD COMMENT
0
Entering edit mode

Thank you.  I had no idea there were different fields for comments and answers.  I'll keep that in mind. 

ADD REPLY

Login before adding your answer.

Traffic: 905 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6