Howto annotate blast subject.id with AnnotationDbi
1
0
Entering edit mode
@arnaud-mounier-5957
Last seen 8.7 years ago
Hi, I try to annotate a blastp result (not so much, just 270 query) with the org.At.tair.db from AnnotationDBi package throw bioconductor. First I read blast output with RFLPtools : > df.blast.report <- read.blast(file = f.blast.report) > head(df.blast.report) query.id subject.id identity alignment.length mismatches gap.opens q.start q.end s.start s.end evalue bit.score 1 medtr8g018420.1 AT1G55020.1 59.77 860 314 9 9 856 20 859 0 1058 2 medtr8g018420.1 AT3G22400.1 56.16 869 344 10 9 856 34 886 0 1004 3 medtr8g018420.1 AT1G72520.1 45.19 821 433 10 45 856 114 926 0 729 The subject ID have version number (.1 or .2) and the original ATH_GO_GOSLIM.txt from tair site two. But this version number is not present in the org.At.tait.dbTAIR : > head(keys(org.At.tair.db, keytype="TAIR")) [1] "AT1G01010" "AT1G01020" "AT1G01030" "AT1G01040" "AT1G01050" "AT1G01060" * Is this relevant or can I annotate without taking care of the version number ? Does Org.At.tair.db keep the version number elsewhere ? Because the source file for this package (ftp://ftp.arabidopsis.org/Ontologies/Gene_Ontology/ATH_GO_GOSLIM.txt) store it initialy. * As the query must be selected in function of her subjects annotations and GO.db, I want to merge all info (blast report, org.At.tair.db + GO.db) in one db with a bioconductor package (annotationForge perhaps). So, is there a package or a GNU script to manage this association easily or do i wrote my own R scripts ? Any links are welcomes ! Thank's in advance, Ar. -- ? Le soleil filtre ? travers les branches des arbres par ?clairs, comme le sens ? travers la langue. ? Nancy Huston Arnaud Mounier INRA - UMR Agro?cologie 1347 CNRS - ERL IPM 6300 (Plant-Microorganism Interaction) 17, rue Sully - BP 86510 - F-21065 Dijon Cedex - France Work phone : +33 380 693 167 - Fax : +33 380 693 753 https://www6.dijon.inra.fr/umragroecologie/Personnel/IPM/ITA/MOUNIER- Arnaud
Alignment GO annotate AnnotationDbi Alignment GO annotate AnnotationDbi • 1.7k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.3 years ago
United States
Hi Arnaud, It should be fine as long as you remember that the shortened name is that of the locus ID, and that you are therefore annotating at the gene level (as is the case for all the org packages). You can see the TAIR website for more clarification about what the different IDs mean. Here is a page for the locus ID "AT1G01010": http://www.arabidopsis.org/servlets/TairObject?type=locus&id=137158 And here is a page for the gene model "AT1G01010.1": http://www.arabidopsis.org/servlets/TairObject?type=gene&id=138508 Marc On 05/27/2013 01:12 AM, Arnaud Mounier wrote: > Hi, > > I try to annotate a blastp result (not so much, just 270 query) with > the org.At.tair.db from AnnotationDBi package throw bioconductor. > > First I read blast output with RFLPtools : > > > df.blast.report <- read.blast(file = f.blast.report) > > head(df.blast.report) > query.id subject.id identity alignment.length mismatches > gap.opens q.start q.end s.start s.end evalue bit.score > 1 medtr8g018420.1 AT1G55020.1 59.77 860 314 > 9 9 856 20 859 0 1058 > 2 medtr8g018420.1 AT3G22400.1 56.16 869 344 > 10 9 856 34 886 0 1004 > 3 medtr8g018420.1 AT1G72520.1 45.19 821 433 > 10 45 856 114 926 0 729 > > The subject ID have version number (.1 or .2) and the original > ATH_GO_GOSLIM.txt from tair site two. But this version number is not > present in the org.At.tait.dbTAIR : > > head(keys(org.At.tair.db, keytype="TAIR")) > [1] "AT1G01010" "AT1G01020" "AT1G01030" "AT1G01040" "AT1G01050" > "AT1G01060" > > > * Is this relevant or can I annotate without taking care of the > version number ? > Does Org.At.tair.db keep the version number elsewhere ? > Because the source file for this package > (ftp://ftp.arabidopsis.org/Ontologies/Gene_Ontology/ATH_GO_GOSLIM.txt) > store it initialy. > > * As the query must be selected in function of her subjects > annotations and GO.db, I want to merge all info (blast report, > org.At.tair.db + GO.db) in one db with a bioconductor package > (annotationForge perhaps). > So, is there a package or a GNU script to manage this association > easily or do i wrote my own R scripts ? > > Any links are welcomes ! > Thank's in advance, > Ar. >
ADD COMMENT
0
Entering edit mode
Hi everybody, Hi Marc, Le 28/05/2013 19:37, Marc Carlson a ?crit : > Hi Arnaud, > > It should be fine as long as you remember that the shortened name is > that of the locus ID, and that you are therefore annotating at the gene > level (as is the case for all the org packages). > > You can see the TAIR website for more clarification about what the > different IDs mean. Here is a page for the locus ID "AT1G01010": > > http://www.arabidopsis.org/servlets/TairObject?type=locus&id=137158 > > And here is a page for the gene model "AT1G01010.1": > > http://www.arabidopsis.org/servlets/TairObject?type=gene&id=138508 Thank's for your response. So is this possible to interrogate a org package such as org.At.tair.db in gene model level (aka AT1G01010.1) ? Ar. -- ? Le soleil filtre ? travers les branches des arbres par ?clairs, comme le sens ? travers la langue. ? Nancy Huston Arnaud Mounier INRA - UMR Agro?cologie 1347 CNRS - ERL IPM 6300 (Plant-Microorganism Interaction) 17, rue Sully - BP 86510 - F-21065 Dijon Cedex - France Work phone : +33 380 693 167 - Fax : +33 380 693 753 https://www6.dijon.inra.fr/umragroecologie/Personnel/IPM/ITA/MOUNIER- Arnaud
ADD REPLY
0
Entering edit mode
Unfortunately no. Those IDs are not present in the org.At.eg.db package as this is a gene-level annotation package. These kinds of IDs have never been included in this package, although I guess that we could consider adding them at some point in the future. Marc On 05/29/2013 05:49 AM, Arnaud Mounier wrote: > Hi everybody, Hi Marc, > > Le 28/05/2013 19:37, Marc Carlson a ?crit : >> Hi Arnaud, >> >> It should be fine as long as you remember that the shortened name is >> that of the locus ID, and that you are therefore annotating at the gene >> level (as is the case for all the org packages). >> >> You can see the TAIR website for more clarification about what the >> different IDs mean. Here is a page for the locus ID "AT1G01010": >> >> http://www.arabidopsis.org/servlets/TairObject?type=locus&id=137158 >> >> And here is a page for the gene model "AT1G01010.1": >> >> http://www.arabidopsis.org/servlets/TairObject?type=gene&id=138508 > Thank's for your response. So is this possible to interrogate a org > package such as org.At.tair.db in gene model level (aka AT1G01010.1) ? > > Ar. >
ADD REPLY
0
Entering edit mode
Le 29/05/2013 19:54, Marc Carlson a ?crit : > Unfortunately no. Those IDs are not present in the org.At.eg.db package > as this is a gene-level annotation package. These kinds of IDs have > never been included in this package, although I guess that we could > consider adding them at some point in the future. Indeed, it could be a good idea because there some issues which can't avoid this future (I think). Here an example from a blastp : > df.blast.report[df.blast.report$"query.id" == "medtr7g099680.1",] query.id subject.id identity alignment.length mismatches gap.opens q.start q.end s.start s.end evalue bit.score 99 medtr7g099680.1 AT1G79930.2 35.62 438 266 4 10 434 4 438 2e-85 289 100 medtr7g099680.1 AT1G79930.1 35.62 438 266 4 10 434 4 438 2e-85 290 101 medtr7g099680.1 AT2G32120.2 33.98 512 310 9 9 508 30 525 7e-85 282 102 medtr7g099680.1 AT2G32120.1 33.98 512 310 9 9 508 30 525 7e-85 282 103 medtr7g099680.1 AT1G79920.1 35.62 438 266 4 10 434 4 438 1e-84 288 104 medtr7g099680.1 AT1G79920.2 35.62 438 266 4 10 434 4 438 2e-84 287 Each row couple (99-100, 101-102, 103-104) have the same query.id and the difference between each subject.id in this 3 couples is only at gene model level. Information should be lost after this point. You can notice that despite the gene model's difference, all other information are the same. But here another example with 3 gene model different for the same locus and 3 differents hits. > df.blast.report[df.blast.report$"query.id" == "medtr8g081490.1",] query.id subject.id identity alignment.length mismatches gap.opens q.start q.end s.start s.end evalue bit.score 188 medtr8g081490.1 AT4G13940.3 80.35 453 43 2 1 452 1 408 0 734 189 medtr8g081490.1 AT4G13940.2 89.43 331 34 1 123 452 2 332 0 622 190 medtr8g081490.1 AT4G13940.4 89.29 308 32 1 1 307 1 308 0 559 Thank's for you reply, Ar. -- ? Le soleil filtre ? travers les branches des arbres par ?clairs, comme le sens ? travers la langue. ? Nancy Huston Arnaud Mounier INRA - UMR Agro?cologie 1347 CNRS - ERL IPM 6300 (Plant-Microorganism Interaction) 17, rue Sully - BP 86510 - F-21065 Dijon Cedex - France Work phone : +33 380 693 167 - Fax : +33 380 693 753 https://www6.dijon.inra.fr/umragroecologie/Personnel/IPM/ITA/MOUNIER- Arnaud
ADD REPLY

Login before adding your answer.

Traffic: 591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6