Entering edit mode
Colleagues,
I'd like to use pathview with E.coli data.
While the Homo sapience example from the manual works just fine:
pv.out <- pathview(gene.data = gse16873.d[, 1], pathway.id =
demo.paths$sel.paths[i], species="hsa", out.sufix="gse1683",
kegg.native=TRUE)
using an analogous run with E.coli data keeps failing:
eco.out <- pathview(gene.data = data02010, pathway.id = "02010",
out.suffix = "ecotest", species = "eco", kegg.native=TRUE)
[1] "Downloading xml files for eco02010, 1/1 pathways.."
[1] "Downloading png files for eco02010, 1/1 pathways.."
Error in mol.data[as.character(items[hit]), ] : subscript out of
bounds
In addition: Warning messages:
1: In node.map(gene.data, node.data, node.types = gene.node.type,
node.sum = node.sum) :
NAs introduced by coercion
2: In FUN(1:153[[1L]], ...) : NAs introduced by coercion
I've checked variations of the input data structure, tried subsetting
of the genes to those used in the pathway to be colored only - as
shown here, and the "subscript out of bounds" error was still there.
In fact, if we compare the structure of the data in the vignette and
the cistom data, they are the same:
str(gse16873.d[, 1])
?Named num [1:11979] -0.3076 0.4159 0.1985 -0.2316 -0.0449 ...
?- attr(*, "names")= chr [1:11979] "10000" "10001" "10002" "10003" ...
str(data02010)
?Named num [1:47] 2.95 2.25 1.97 1.72 1.72 ...
?- attr(*, "names")= chr [1:47] "b0365" "b0366" "b0829" "b0830" ...
If we look at the respective XML files, we see consistency as well:
<entry id="2" name="hsa:51343" type="gene" link="<a href=" http:="" www.kegg.jp="" dbget-bin="" www_bget?hsa:51343"="" rel="nofollow">http://www.kegg.jp/dbget-bin/www_bget?hsa:51343">
<graphics name="FZR1, CDC20C, CDH1, FZR, FZR2, HCDH, HCDH1" fgcolor="#000000" bgcolor="#BFFFBF" type="rectangle" x="919" y="536" width="46" height="17"/>
</entry>
<entry id="4" name="eco:b1513" type="gene" link="<a href=" http:="" www.kegg.jp="" dbget-bin="" www_bget?eco:b1513"="" rel="nofollow">http://www.kegg.jp/dbget-bin/www_bget?eco:b1513">
<graphics name="lsrA" fgcolor="#000000" bgcolor="#BFFFBF" type="rectangle" x="339" y="1882" width="46" height="17"/>
</entry>
I.e. XML gene entries have name="Organism_ID:GeneID", and the GeneIDs
are expected to be the names attached to the expression data.
This is true in both of the 2 cases, however hsa example works and eco
example does not.
Couterintuitively, the "subscript out of bounds" error seems to stem
not from the fact of having some unrecognizable IDs in the expression
file but rather from having RECOGNIZABLE (!!!!) IDs there. If we
change the IDs in the expression file to some nonsence, the function
eats it up and there is no "out of bounds" error anymore! (this
observation came from an attempt to use gene names instead of
b-numbers in the expression file; the phenomenon was checked several
times in clean environments etc)
Example (with the bla.data object in the attached rda file)
bla.out <- pathview(gene.data = bla.data, out.suffix = "bla", species
= "eco", pathway.id = "02010", kegg.native=TRUE)
Working in directory ....
Writing image file eco02010.bla.png
There were 50 or more warnings (use warnings() to see the first 50)
Warning messages:
1: In FUN(1:153[[153L]], ...) : NAs introduced by coercion
As a result of using nonsense IDs, graphical files are generated just
fine, without coloring, of course.
And using real IDs that match the XML file contents always resulted in
the "out of bounds" error (the data02010 object is included in the
attached file)
Any ideas?
Thanks,
Oleg