Search
Question: What annotation package?
0
gravatar for Ed Siefker
26 days ago by
Ed Siefker210
United States
Ed Siefker210 wrote:

How do I figure out what annotation package to use with a transcriptome?

I have reads counted with rna.fa.gz from:
ftp://ftp.ncbi.nih.gov/genomes/M_musculus/ARCHIVE/BUILD.37.1/RNA/

I can't find an NCBI annotation package for Mm, or any Mm annotation package that references build 37. 
Names look like this:

> head(data$Name)
[1] "gi|126352347|ref|NM_028260.2|" "gi|142348699|ref|NM_010886.2|"
[3] "gi|23821034|ref|NM_017376.2|"  "gi|33239323|ref|NM_146675.1|"
[5] "gi|118130860|ref|NM_133839.2|" "gi|29789242|ref|NM_024451.1|"

I thought that the numbers following 'gi|' would be entrez gene identifiers. 
So I tried org.Mm.eg.db. But this doesn't seem to work:

> x <- org.Mm.egGENENAME
> mapped_genes <- mappedkeys(x)
> which(mapped_genes=="126352347")
integer(0)
> which(mapped_genes==126352347)
integer(0)

I'm confused and don't know what to try. 

ADD COMMENTlink modified 26 days ago by James W. MacDonald45k • written 26 days ago by Ed Siefker210

I figure the IDs after "ref|" are REFSEQ? 

> head(as.list(org.Mm.egREFSEQ), n=2)
$`11287`
[1] "NM_007376" "NP_031402"

$`11298`
[1] "NM_009591"    "NP_033721"    "NR_033223"    "XM_017314223" "XM_017314224"
[6] "XP_017169712" "XP_017169713"


Looks like it.  So I need to chop up data$Name to get the REFSEQ out.
 

> IDs <- strsplit(data$Name, "\\|")
my.REFSEQ <- gsub("\\.*","",my.REFSEQ)
head(my.REFSEQ)> my.REFSEQ <- unlist(lapply(IDs, '[[', 4))
> my.REFSEQ <- gsub("\\.*","",my.REFSEQ)
> head(my.REFSEQ)
[1] "NM_0282602" "NM_0108862" "NM_0173762" "NM_1466751" "NM_1338392"
[6] "NM_0244511"

Now that I have REFSEQs, I can use select() to extract SYMBOLs right?

> select(org.Mm.eg.db, keys = my.REFSEQ, keytype = "REFSEQ", columns = "SYMBOL")
Error in .testForValidKeys(x, keys, keytype, fks) :
  None of the keys entered are valid keys for 'REFSEQ'. Please use the keys method to see a listing of valid arguments.

Apparently not.  What am I doing wrong?

ADD REPLYlink written 26 days ago by Ed Siefker210

Figured it out.  My gsub is borked. 

my.REFSEQ <- gsub("\\..*","",my.REFSEQ)

Anyway, is there a way to do this without the strsplit and gsub? Do any of the annotation packages parse IDs like "gi|126352347|ref|NM_028260.2|" ?

ADD REPLYlink written 26 days ago by Ed Siefker210
0
gravatar for James W. MacDonald
26 days ago by
United States
James W. MacDonald45k wrote:

If you answer your own question, people will assume it's been answered and will then ignore it. So unless you want that to happen, you should be commenting, not answering.

As to your remaining question, I don't know of any function that exists to do that. I just roll my own, as it's simple and often specific.

ADD COMMENTlink modified 26 days ago • written 26 days ago by James W. MacDonald45k

Thank you.  I had no idea there were different fields for comments and answers.  I'll keep that in mind. 

ADD REPLYlink written 26 days ago by Ed Siefker210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 328 users visited in the last hour