Hi
This is kind of an R problem, but on bioconductor data. For example,
I
have the hu6800PATH environment from the hu6800 annotation package.
The
example in the help is this:
xx <- as.list(hu6800PATH)
xx <- xx[!is.na(xx)]
What I actually want is a matrix with two columns, the first being
probe
id and the second being pathway id - I'm going to do some relational
joins with this data using merge().
I've got as far as:
as.matrix(unlist(xx))
But that doesn't give me exactly what I want. The rownames of the
resulting matrix are set to the probe_ids but where there are
duplicate
probe ids (where probes are in >1 pathway) then R appends a numerator
on
the end.
Can anyone help me convert the list format from an annotation package
to
a matrix as I describe above?
Thanks
Mick
>This is kind of an R problem, but on bioconductor data. For example,
I
>have the hu6800PATH environment from the hu6800 annotation package.
The
>example in the help is this:
>
>xx <- as.list(hu6800PATH)
>xx <- xx[!is.na(xx)]
>
>What I actually want is a matrix with two columns, the first being
probe
>id and the second being pathway id - I'm going to do some relational
>joins with this data using merge().
You may try:
> xx <- as.list(hu6800PATH)
> xx <- unlist(xx, use.names = TRUE)
> xx <- cbind(names(xx), xx)
The first column of xx will be probe ids with an integer appended to
the end if
a probe has multiple mappings. Use pattern match to remove the
trailing integers
from the first column then you are done.
>
>I've got as far as:
>
>as.matrix(unlist(xx))
>
>But that doesn't give me exactly what I want. The rownames of the
>resulting matrix are set to the probe_ids but where there are
duplicate
>probe ids (where probes are in >1 pathway) then R appends a numerator
on
>the end.
>
>Can anyone help me convert the list format from an annotation package
to
>a matrix as I describe above?
>
>Thanks
>Mick
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
Jianhua Zhang
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084
Hi Michael,
try this:
res = do.call("rbind", args=lapply(seq(along=xx), function(i)
cbind(names(xx)[i], xx[[i]])))
> res[1:5,]
[,1] [,2]
[1,] "Z22536_at" "04010"
[2,] "Z22536_at" "04060"
[3,] "Z22536_at" "04350"
[4,] "X60221_at" "00190"
[5,] "X60221_at" "00193"
Michael watson (IAH-C) wrote:
> Hi
>
> This is kind of an R problem, but on bioconductor data. For
example, I
> have the hu6800PATH environment from the hu6800 annotation package.
The
> example in the help is this:
>
> xx <- as.list(hu6800PATH)
> xx <- xx[!is.na(xx)]
>
> What I actually want is a matrix with two columns, the first being
probe
> id and the second being pathway id - I'm going to do some relational
> joins with this data using merge().
>
> I've got as far as:
>
> as.matrix(unlist(xx))
>
> But that doesn't give me exactly what I want. The rownames of the
> resulting matrix are set to the probe_ids but where there are
duplicate
> probe ids (where probes are in >1 pathway) then R appends a
numerator on
> the end.
>
> Can anyone help me convert the list format from an annotation
package to
> a matrix as I describe above?
--
Best regards
Wolfgang
-------------------------------------
Wolfgang Huber
European Bioinformatics Institute
European Molecular Biology Laboratory
Cambridge CB10 1SD
England
Phone: +44 1223 494642
Fax: +44 1223 494486
Http: www.ebi.ac.uk/huber
Hi
Thanks for that :-) It was actually an easy and quick way to do the
latter that I was looking for. I can't just undiscrinately get rid of
all integers if they appear at the end of an id in case there are ids
that have integers at the end and are perfectly valid. So I am left
faced with writing some kind of loop(), which is what I wanted to
avoid
in the first place.
I don't want to annoy anyone, but am I the only person who finds the
lists from bioconductor annotation packages a little unhelpful and
hard
to work with? Every example in the help, the first thing they do is
unlist() the list; so why is it a list in the first place???
Thanks
Mick
-----Original Message-----
From: John Zhang [mailto:jzhang@jimmy.harvard.edu]
Sent: 10 February 2005 13:44
To: michael watson (IAH-C)
Cc: bioconductor@stat.math.ethz.ch
Subject: Re: [BioC] Converting annotate lists to a matrix
>This is kind of an R problem, but on bioconductor data. For example,
I
>have the hu6800PATH environment from the hu6800 annotation package.
>The example in the help is this:
>
>xx <- as.list(hu6800PATH)
>xx <- xx[!is.na(xx)]
>
>What I actually want is a matrix with two columns, the first being
>probe id and the second being pathway id - I'm going to do some
>relational joins with this data using merge().
You may try:
> xx <- as.list(hu6800PATH)
> xx <- unlist(xx, use.names = TRUE)
> xx <- cbind(names(xx), xx)
The first column of xx will be probe ids with an integer appended to
the
end if
a probe has multiple mappings. Use pattern match to remove the
trailing
integers
from the first column then you are done.
>
>I've got as far as:
>
>as.matrix(unlist(xx))
>
>But that doesn't give me exactly what I want. The rownames of the
>resulting matrix are set to the probe_ids but where there are
duplicate
>probe ids (where probes are in >1 pathway) then R appends a numerator
>on the end.
>
>Can anyone help me convert the list format from an annotation package
>to a matrix as I describe above?
>
>Thanks
>Mick
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor@stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
Jianhua Zhang
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084