pathview puzzle
2
0
Entering edit mode
Oleg Moskvin ▴ 60
@oleg-moskvin-4293
Last seen 9.4 years ago
United States
Colleagues, I'd like to use pathview with E.coli data. While the Homo sapience example from the manual works just fine: pv.out <- pathview(gene.data = gse16873.d[, 1], pathway.id = demo.paths$sel.paths[i], species="hsa", out.sufix="gse1683", kegg.native=TRUE) using an analogous run with E.coli data keeps failing: eco.out <- pathview(gene.data = data02010, pathway.id = "02010", out.suffix = "ecotest", species = "eco", kegg.native=TRUE) [1] "Downloading xml files for eco02010, 1/1 pathways.." [1] "Downloading png files for eco02010, 1/1 pathways.." Error in mol.data[as.character(items[hit]), ] : subscript out of bounds In addition: Warning messages: 1: In node.map(gene.data, node.data, node.types = gene.node.type, node.sum = node.sum) : NAs introduced by coercion 2: In FUN(1:153[[1L]], ...) : NAs introduced by coercion I've checked variations of the input data structure, tried subsetting of the genes to those used in the pathway to be colored only - as shown here, and the "subscript out of bounds" error was still there. In fact, if we compare the structure of the data in the vignette and the cistom data, they are the same: str(gse16873.d[, 1]) ?Named num [1:11979] -0.3076 0.4159 0.1985 -0.2316 -0.0449 ... ?- attr(*, "names")= chr [1:11979] "10000" "10001" "10002" "10003" ... str(data02010) ?Named num [1:47] 2.95 2.25 1.97 1.72 1.72 ... ?- attr(*, "names")= chr [1:47] "b0365" "b0366" "b0829" "b0830" ... If we look at the respective XML files, we see consistency as well: <entry id="2" name="hsa:51343" type="gene" link="&lt;a href=" http:="" www.kegg.jp="" dbget-bin="" www_bget?hsa:51343"="" rel="nofollow">http://www.kegg.jp/dbget-bin/www_bget?hsa:51343"> <graphics name="FZR1, CDC20C, CDH1, FZR, FZR2, HCDH, HCDH1" fgcolor="#000000" bgcolor="#BFFFBF" type="rectangle" x="919" y="536" width="46" height="17"/> </entry> <entry id="4" name="eco:b1513" type="gene" link="&lt;a href=" http:="" www.kegg.jp="" dbget-bin="" www_bget?eco:b1513"="" rel="nofollow">http://www.kegg.jp/dbget-bin/www_bget?eco:b1513"> <graphics name="lsrA" fgcolor="#000000" bgcolor="#BFFFBF" type="rectangle" x="339" y="1882" width="46" height="17"/> </entry> I.e. XML gene entries have name="Organism_ID:GeneID", and the GeneIDs are expected to be the names attached to the expression data. This is true in both of the 2 cases, however hsa example works and eco example does not. Couterintuitively, the "subscript out of bounds" error seems to stem not from the fact of having some unrecognizable IDs in the expression file but rather from having RECOGNIZABLE (!!!!) IDs there. If we change the IDs in the expression file to some nonsence, the function eats it up and there is no "out of bounds" error anymore! (this observation came from an attempt to use gene names instead of b-numbers in the expression file; the phenomenon was checked several times in clean environments etc) Example (with the bla.data object in the attached rda file) bla.out <- pathview(gene.data = bla.data, out.suffix = "bla", species = "eco", pathway.id = "02010", kegg.native=TRUE) Working in directory .... Writing image file eco02010.bla.png There were 50 or more warnings (use warnings() to see the first 50) Warning messages: 1: In FUN(1:153[[153L]], ...) : NAs introduced by coercion As a result of using nonsense IDs, graphical files are generated just fine, without coloring, of course. And using real IDs that match the XML file contents always resulted in the "out of bounds" error (the data02010 object is included in the attached file) Any ideas? Thanks, Oleg
Pathways pathview Pathways pathview • 2.0k views
ADD COMMENT
0
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 10 months ago
United States
Hi Oleg, You are right, the problem is due to ID type inconsistency. You have to specify gene.idtype when calling pathview function, if your gene id type is not Entrez Gene. I don?t think b-numbers are recognized for sure. For your gene name example, if you mean official gene symbols by ?gene name?, you should specify gene.idtype="SYMBOL" (lower case is fine): eco2.out <- pathview(gene.data = T2.CEBF095.crt115.ASCH.DROP3.rel.gn, pathway.id = "02010", gene.idtype="SYMBOL", out.suffix = "T2ACSH", species = "eco", kegg.native=TRUE) You may want to check the help info on pathview function for details: ?pathview Pathview supports 10 different common ID types for a model organisms (plus KEGG orthology IDs). For the supported common ID types, type: gene.idtype.list For external IDs not in the supported common ID type lists, we may make use of the mol.sum function to do the ID and data mapping explicitly. Check the example in page 14 of the vignette or help info on the function: ?mol.sum HTH. Weijun -------------------------------------------- On Wed, 8/21/13, Oleg Moskvin <moskvin at="" wisc.edu=""> wrote: Subject: pathview: problem with coloring Date: Wednesday, August 21, 2013, 6:12 PM Hi Weijun, Your pathview is very attractive package. While I can reproduce the results with the human data provided in the example, I am getting coloring problems with E.coli data. This seems to be gene ID mismatch that comes from the inconsistency in the ID handling by the package. The KEGG pathways fro E.coli contains "b-numbers" as gene IDs. If I supply expression set based on b-numbers, it is not recognized, if I supply expression set based on gene names, it is (!) recognized but the resulting coloring is all-white (#FFFFFF). Details: ###### 1. Using b-numbers: head(T2.CEBF095.crt115.ASCH.DROP3.rel) ACSH_vs_synH EKO11_2926 -1.3362079 b0019 0.9265879 b0032 -4.2007218 b0033 -3.6678436 b0058 1.1996750 b0060 0.8624787 eco.out <- pathview(gene.data = T2.CEBF095.crt115.ASCH.DROP3.rel, pathway.id = "02010", out.suffix = "T2ACSH", species = "eco", kegg.native=TRUE) [1] "Downloading xml files for eco02010, 1/1 pathways.." [1] "Downloading png files for eco02010, 1/1 pathways.." Error in mol.data[as.character(items[hit]), ] : subscript out of bounds In addition: Warning messages: 1: In node.map(gene.data, node.data, node.types = gene.node.type, node.sum = node.sum) : NAs introduced by coercion 2: In FUN(1:153[[1L]], ...) : NAs introduced by coercion ###### 2. Using gene names: headT2.CEBF095.crt115.ASCH.DROP3.rel.gn) ACSH_vs_synH nhaA 0.9265879 carA -4.2007218 carB -3.6678436 caiF -1.4380677 folA -0.8914105 rluA 1.1996750 eco2.out <- pathview(gene.data = T2.CEBF095.crt115.ASCH.DROP3.rel.gn, pathway.id = "02010", out.suffix = "T2ACSH", species = "eco", kegg.native=TRUE) Loading required package: org.EcK12.eg.db Working in directory /mnt/omdir/omoskvin/Projects/Ecoli/cMonkey Writing image file eco02010.T2ACSH.png There were 50 or more warnings (use warnings() to see the first 50) > head(eco2.out[[1]]) kegg.names labels type x y width height ACSH_vs_synH mol.col 4 b1513 gene 339 1882 46 17 NA #FFFFFF 5 b1515 gene 293 1890 46 17 NA #FFFFFF 6 b1514 gene 293 1873 46 17 NA #FFFFFF 7 b1516 gene 247 1882 46 17 NA #FFFFFF 18 b4087 gene 339 1823 46 17 NA #FFFFFF 19 b4086 gene 293 1823 46 17 NA #FFFFFF So, b-numbers cause an early "out of bounds" error while gene names result in proceeding further but no coloring in the result! Please help. Thank you, Oleg
ADD COMMENT
0
Entering edit mode
Oleg Moskvin ▴ 60
@oleg-moskvin-4293
Last seen 9.4 years ago
United States
Hi Weijun, Thank you for the response. The problem seems to be deeper than that and is connected to special handling of a particular species - E.coli - by KEGG. I looked into the pathview() code and here is what I see: 1) gene.data is remapped internally via mol.sum() to have ENTREZ IDs; 2) remapped gene.data is used by node.map() to map onto KEGG nodes using node.data 3) the node.data used in (2) was originally extracted from the KEGG XML by node.info() The above route implies that the "name" entries in the KEGG XML of type="gene" have "speciesID:ENTREZ" format... And in the case of E.coli this doesn't hold true! See the examples of XML entries for H.sapience and E.coli from my yesterday's message (below). In fact, in KEGG XML for E.coli "gene" records b-numbers are used as IDs! So, for the cases like that, when KEGG fails to be consistent in the supplied XML structure, one may suggest introducing an "id.bypass" option to pathview() which will take the gene.data as is (with the IDs supplied by user that match KEGG XML ids; for example, b-numbers), and pass this directly to the step 3 (node matching). Thanks! Oleg On 08/22/13, Luo Weijun wrote: > Hi Oleg, > You are right, the problem is due to ID type inconsistency. > You have to specify gene.idtype when calling pathview function, if your gene id type is not Entrez Gene. I don?t think b-numbers are recognized for sure. For your gene name example, if you mean official gene symbols by ?gene name?, you should specify gene.idtype="SYMBOL" (lower case is fine): > eco2.out <- pathview(gene.data = T2.CEBF095.crt115.ASCH.DROP3.rel.gn, pathway.id = "02010", gene.idtype="SYMBOL", out.suffix = "T2ACSH", species = "eco", kegg.native=TRUE) On 08/22/13, Oleg Moskvin wrote: > > <entry id="2" name="hsa:51343" type="gene"> link="http://www.kegg.jp/dbget-bin/www_bget?hsa:51343"> > <graphics name="FZR1, CDC20C, CDH1, FZR, FZR2, HCDH, HCDH1" fgcolor="#000000" bgcolor="#BFFFBF"> type="rectangle" x="919" y="536" width="46" height="17"/> > </entry> > > > <entry id="4" name="eco:b1513" type="gene"> link="http://www.kegg.jp/dbget-bin/www_bget?eco:b1513"> > <graphics name="lsrA" fgcolor="#000000" bgcolor="#BFFFBF"> type="rectangle" x="339" y="1882" width="46" height="17"/> > </entry>
ADD COMMENT
0
Entering edit mode
Hi Oleg, Thanks for the note. This is indeed a problem I didn?t realize previously! KEGG uses Entrez Gene ID for all other model organisms I?ve checked. I am working on a generic fix (not only for E coli but other species with similar situation) and will incorporate that into the development version of pathview soon. Will keep you posted. Thanks for pointing this out. Weijun -------------------------------------------- On Fri, 8/23/13, Oleg Moskvin <moskvin at="" wisc.edu=""> wrote: Subject: Re: [BioC] pathview puzzle Date: Friday, August 23, 2013, 12:19 PM Hi Weijun, Thank you for the response. The problem seems to be deeper than that and is connected to special handling of a particular species - E.coli - by KEGG. I looked into the pathview() code and here is what I see: 1) gene.data is remapped internally via mol.sum() to have ENTREZ IDs; 2) remapped gene.data is used by node.map() to map onto KEGG nodes using node.data 3) the node.data used in (2) was originally extracted from the KEGG XML by node.info() The above route implies that the "name" entries in the KEGG XML of type="gene" have "speciesID:ENTREZ" format... And in the case of E.coli this doesn't hold true! See the examples of XML entries for H.sapience and E.coli from my yesterday's message (below). In fact, in KEGG XML for E.coli "gene" records b-numbers are used as IDs! So, for the cases like that, when KEGG fails to be consistent in the supplied XML structure, one may suggest introducing an "id.bypass" option to pathview() which will take the gene.data as is (with the IDs supplied by user that match KEGG XML ids; for example, b-numbers), and pass this directly to the step 3 (node matching). Thanks! Oleg On 08/22/13, Luo Weijun wrote: > Hi Oleg, > You are right, the problem is due to ID type inconsistency. > You have to specify gene.idtype when calling pathview function, if your gene id type is not Entrez Gene. I don?t think b-numbers are recognized for sure. For your gene name example, if you mean official gene symbols by ?gene name?, you should specify gene.idtype="SYMBOL" (lower case is fine): > eco2.out <- pathview(gene.data = T2.CEBF095.crt115.ASCH.DROP3.rel.gn, pathway.id = "02010", gene.idtype="SYMBOL", out.suffix = "T2ACSH", species = "eco", kegg.native=TRUE) On 08/22/13, Oleg Moskvin? wrote: > > <entry id="2" name="hsa:51343" type="gene"> link="http://www.kegg.jp/dbget-bin/www_bget?hsa:51343"> > <graphics name="FZR1, CDC20C, CDH1, FZR, FZR2, HCDH, HCDH1" fgcolor="#000000" bgcolor="#BFFFBF"> type="rectangle" x="919" y="536" width="46" height="17"/> > </entry> > > > <entry id="4" name="eco:b1513" type="gene"> link="http://www.kegg.jp/dbget-bin/www_bget?eco:b1513"> > <graphics name="lsrA" fgcolor="#000000" bgcolor="#BFFFBF"> type="rectangle" x="339" y="1882" width="46" height="17"/> > </entry>
ADD REPLY
0
Entering edit mode
Hi Oleg, I just update pathview package so it can process and analyze data labeled with KEGG gene IDs other than Entrez Gene. It turns out that this issue affects many other species too. So with this update, you can literaully work with all ~2300 (and more forth-coming) KEGG species data with pathview now. I?ve also added new content with working examples on KEGG species and Gene ID usage in page 14-16 of the vignette. Notice that you need to specified gene.idtype="KEGG" when calling pathview. I?ve posted the new package to R-forge. You should be able to access it in the next few hours at http://r-forge.r-project.org/R/?group_id=1619. Just install it follow the instruction there. The Bioc version will also be updated in the next 1-2 days: http://bioconductor.org/packages/devel/bioc/html/pathview.html. Let me know how that works or if you have questions. HTH. Weijun -------------------------------------------- Subject: Re: [BioC] pathview puzzle To: Bioconductor at r-project.org, "Oleg Moskvin" <moskvin at="" wisc.edu=""> Date: Friday, August 23, 2013, 9:53 PM Hi Oleg, Thanks for the note. This is indeed a problem I didn?t realize previously! KEGG uses Entrez Gene ID for all other model organisms I?ve checked. I am working on a generic fix (not only for E coli but other species with similar situation) and will incorporate that into the development version of pathview soon. Will keep you posted. Thanks for pointing this out. Weijun -------------------------------------------- On Fri, 8/23/13, Oleg Moskvin <moskvin at="" wisc.edu=""> wrote: Subject: Re: [BioC] pathview puzzle To: Bioconductor at r-project.org, Date: Friday, August 23, 2013, 12:19 PM Hi Weijun, Thank you for the response. The problem seems to be deeper than that and is connected to special handling of a particular species - E.coli - by KEGG. I looked into the pathview() code and here is what I see: 1) gene.data is remapped internally via mol.sum() to have ENTREZ IDs; 2) remapped gene.data is used by node.map() to map onto KEGG nodes using node.data 3) the node.data used in (2) was originally extracted from the KEGG XML by node.info() The above route implies that the "name" entries in the KEGG XML of type="gene" have "speciesID:ENTREZ" format... And in the case of E.coli this doesn't hold true! See the examples of XML entries for H.sapience and E.coli from my yesterday's message (below). In fact, in KEGG XML for E.coli "gene" records b-numbers are used as IDs! So, for the cases like that, when KEGG fails to be consistent in the supplied XML structure, one may suggest introducing an "id.bypass" option to pathview() which will take the gene.data as is (with the IDs supplied by user that match KEGG XML ids; for example, b-numbers), and pass this directly to the step 3 (node matching). Thanks! Oleg On 08/22/13, Luo Weijun wrote: > Hi Oleg, > You are right, the problem is due to ID type inconsistency. > You have to specify gene.idtype when calling pathview function, if your gene id type is not Entrez Gene. I don?t think b-numbers are recognized for sure. For your gene name example, if you mean official gene symbols by ?gene name?, you should specify gene.idtype="SYMBOL" (lower case is fine): > eco2.out <- pathview(gene.data = T2.CEBF095.crt115.ASCH.DROP3.rel.gn, pathway.id = "02010", gene.idtype="SYMBOL", out.suffix = "T2ACSH", species = "eco", kegg.native=TRUE) On 08/22/13, Oleg Moskvin? wrote: > > <entry id="2" name="hsa:51343" type="gene"> link="http://www.kegg.jp/dbget-bin/www_bget?hsa:51343"> > <graphics name="FZR1, CDC20C, CDH1, FZR, FZR2, HCDH, HCDH1" fgcolor="#000000" bgcolor="#BFFFBF"> type="rectangle" x="919" y="536" width="46" height="17"/> > </entry> > > > <entry id="4" name="eco:b1513" type="gene"> link="http://www.kegg.jp/dbget-bin/www_bget?eco:b1513"> > <graphics name="lsrA" fgcolor="#000000" bgcolor="#BFFFBF"> type="rectangle" x="339" y="1882" width="46" height="17"/> > </entry>
ADD REPLY
0
Entering edit mode
The updated pathview (version 1.1.5) is now available through BioC devel version: http://bioconductor.org/packages/2.13/bioc/html/pathview.html R-forge version failed to build because they haven?t installed some dependency package with their new R 3.0.1. I?ve contact their admin, but not sure when this can be solved. Weijun -------------------------------------------- Subject: Re: [BioC] pathview puzzle To: "Oleg Moskvin" <moskvin at="" wisc.edu=""> Cc: Bioconductor at r-project.org Date: Wednesday, August 28, 2013, 2:44 PM Hi Oleg, I just update pathview package so it can process and analyze data labeled with KEGG gene IDs other than Entrez Gene. It turns out that this issue affects many other species too. So with this update, you can literaully work with all ~2300 (and more forth-coming) KEGG species data with pathview now. I?ve also added new content with working examples on KEGG species and Gene ID usage in page 14-16 of the vignette. Notice that you need to specified gene.idtype="KEGG" when calling pathview. I?ve posted the new package to R-forge. You should be able to access it in the next few hours at http://r-forge.r-project.org/R/?group_id=1619. Just install it follow the instruction there. The Bioc version will also be updated in the next 1-2 days: http://bioconductor.org/packages/devel/bioc/html/pathview.html. Let me know how that works or if you have questions. HTH. Weijun -------------------------------------------- wrote: Subject: Re: [BioC] pathview puzzle To: Bioconductor at r-project.org, "Oleg Moskvin" <moskvin at="" wisc.edu=""> Date: Friday, August 23, 2013, 9:53 PM Hi Oleg, Thanks for the note. This is indeed a problem I didn?t realize previously! KEGG uses Entrez Gene ID for all other model organisms I?ve checked. I am working on a generic fix (not only for E coli but other species with similar situation) and will incorporate that into the development version of pathview soon. Will keep you posted. Thanks for pointing this out. Weijun -------------------------------------------- On Fri, 8/23/13, Oleg Moskvin <moskvin at="" wisc.edu=""> wrote: ? Subject: Re: [BioC] pathview puzzle ? To: Bioconductor at r-project.org, ? Date: Friday, August 23, 2013, 12:19 PM ? ? Hi Weijun, ? ? Thank you for the response. ? ? The problem seems to be deeper than that and is connected to ? special handling of a particular species - E.coli - by KEGG. ? ? ? I looked into the pathview() code and here is what I see: ? ? 1) gene.data is remapped internally via mol.sum() to have ? ENTREZ IDs; ? 2) remapped gene.data is used by node.map() to map onto KEGG ? nodes using node.data ? 3) the node.data used in (2) was originally extracted from ? the KEGG XML by node.info() ? ? The above route implies that the "name" entries in the KEGG ? XML of type="gene" have "speciesID:ENTREZ" format... ? ? And in the case of E.coli this doesn't hold true! See the ? examples of XML entries for H.sapience and E.coli from my ? yesterday's message (below). ? ? In fact, in KEGG XML for E.coli "gene" records b-numbers are ? used as IDs! ? ? So, for the cases like that, when KEGG fails to be ? consistent in the supplied XML structure, one may suggest ? introducing an "id.bypass" option to pathview() which will ? take the gene.data as is (with the IDs supplied by user that ? match KEGG XML ids; for example, b-numbers), and pass this ? directly to the step 3 (node matching). ? ? Thanks! ? ? Oleg ? ? ? ? On 08/22/13, Luo Weijun wrote: ? > Hi Oleg, ? > You are right, the problem is due to ID type ? inconsistency. ? > You have to specify gene.idtype when calling pathview ? function, if your gene id type is not Entrez Gene. I don?t ? think b-numbers are recognized for sure. For your gene name ? example, if you mean official gene symbols by ?gene ? name?, you should specify gene.idtype="SYMBOL" (lower case ? is fine): ? > eco2.out <- pathview(gene.data = ? T2.CEBF095.crt115.ASCH.DROP3.rel.gn, pathway.id = "02010", ? gene.idtype="SYMBOL", out.suffix = "T2ACSH", species = ? "eco", kegg.native=TRUE) ? ? ? On 08/22/13, Oleg Moskvin? wrote: ? ? > ? > <entry id="2" name="hsa:51343" type="gene" ?=""> link="http://www.kegg.jp/dbget-bin/www_bget?hsa:51343"> ? > <graphics name="FZR1, CDC20C, CDH1, FZR, FZR2, HCDH, ? HCDH1" fgcolor="#000000" bgcolor="#BFFFBF" ?=""> type="rectangle" x="919" y="536" width="46" ? height="17"/> ? > </entry> ? > ? > ? > <entry id="4" name="eco:b1513" type="gene" ?=""> link="http://www.kegg.jp/dbget-bin/www_bget?eco:b1513"> ? > <graphics name="lsrA" fgcolor="#000000" ?="" bgcolor="#BFFFBF" ?=""> type="rectangle" x="339" y="1882" width="46" ? height="17"/> ? > </entry>
ADD REPLY

Login before adding your answer.

Traffic: 534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6