Question

What annotation package?

0

Entering edit mode

Ed Siefker ▴ 230

@ed-siefker-5136

Last seen 2.1 years ago

United States

How do I figure out what annotation package to use with a transcriptome?

I have reads counted with rna.fa.gz from:
ftp://ftp.ncbi.nih.gov/genomes/M_musculus/ARCHIVE/BUILD.37.1/RNA/

I can't find an NCBI annotation package for Mm, or any Mm annotation package that references build 37.
Names look like this:

> head(data$Name)
[1] "gi|126352347|ref|NM_028260.2|" "gi|142348699|ref|NM_010886.2|"
[3] "gi|23821034|ref|NM_017376.2|" "gi|33239323|ref|NM_146675.1|"
[5] "gi|118130860|ref|NM_133839.2|" "gi|29789242|ref|NM_024451.1|"

I thought that the numbers following 'gi|' would be entrez gene identifiers.
So I tried org.Mm.eg.db. But this doesn't seem to work:

> x <- org.Mm.egGENENAME
> mapped_genes <- mappedkeys(x)
> which(mapped_genes=="126352347")
integer(0)
> which(mapped_genes==126352347)
integer(0)

I'm confused and don't know what to try.

annotation • 1.3k views

ADD COMMENT • link updated 8.2 years ago by James W. MacDonald 68k • written 8.2 years ago by Ed Siefker ▴ 230

0

Entering edit mode

I figure the IDs after "ref|" are REFSEQ?

> head(as.list(org.Mm.egREFSEQ), n=2)
$`11287`
[1] "NM_007376" "NP_031402"

$`11298`
[1] "NM_009591" "NP_033721" "NR_033223" "XM_017314223" "XM_017314224"
[6] "XP_017169712" "XP_017169713"

Looks like it. So I need to chop up data$Name to get the REFSEQ out.

> IDs <- strsplit(data$Name, "\\|")
my.REFSEQ <- gsub("\\.*","",my.REFSEQ)
head(my.REFSEQ)> my.REFSEQ <- unlist(lapply(IDs, '[[', 4))
> my.REFSEQ <- gsub("\\.*","",my.REFSEQ)
> head(my.REFSEQ)
[1] "NM_0282602" "NM_0108862" "NM_0173762" "NM_1466751" "NM_1338392"
[6] "NM_0244511"

Now that I have REFSEQs, I can use select() to extract SYMBOLs right?

> select(org.Mm.eg.db, keys = my.REFSEQ, keytype = "REFSEQ", columns = "SYMBOL")
Error in .testForValidKeys(x, keys, keytype, fks) :
None of the keys entered are valid keys for 'REFSEQ'. Please use the keys method to see a listing of valid arguments.

Apparently not. What am I doing wrong?

ADD REPLY • link 8.2 years ago Ed Siefker ▴ 230

0

Entering edit mode

Figured it out. My gsub is borked.

my.REFSEQ <- gsub("\\..*","",my.REFSEQ)

Anyway, is there a way to do this without the strsplit and gsub? Do any of the annotation packages parse IDs like "gi|126352347|ref|NM_028260.2|" ?

ADD REPLY • link 8.2 years ago Ed Siefker ▴ 230

score 0 · Answer 1 · 2017-10-24

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 10 days ago

United States

If you answer your own question, people will assume it's been answered and will then ignore it. So unless you want that to happen, you should be commenting, not answering.

As to your remaining question, I don't know of any function that exists to do that. I just roll my own, as it's simple and often specific.

ADD COMMENT • link 8.2 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you. I had no idea there were different fields for comments and answers. I'll keep that in mind.

ADD REPLY • link 8.2 years ago Ed Siefker ▴ 230