On 26 Dec 2005, kfbargad at ehu.es wrote:
> Dear list,
>
> I have two sets of genes from the same experiment,
>
>> PinC
> Expression Set (exprSet) with
> 1310 genes
> 8 samples
> phenoData object with 2 variables and 8 cases
> varLabels
> FileName: read from file
> Target: read from file
>> PinS
> Expression Set (exprSet) with
> 2891 genes
> 8 samples
> phenoData object with 2 variables and 8 cases
> varLabels
> FileName: read from file
> Target: read from file
>
>
> How can I merge these two sets? I tried union() on two vectors
> created from the probe IDs but failed. Any hints?
One approach would be to create a new exprSet object manually using
the data from PinC and PinS. Basically, create a new phenoData object
with the data for all 16 cases, and a new epxression matrix with 16
columns (assuming the two original exprSets represent disjoint sets of
samples).
Thinking out loud, is this a common enough operation to warrant a
method for exprSets? I could imagine c() being defined on exprSets
such that if the phenoData columns are the same and the "sample ids"
as given by the rownames of phenoData/colnames of exprs are disjoint,
then do the obvious thing, else error.
+ seth
Hi,
I think that the problem is that the arrays are not the same - and
then life is much harder. There are some papers on it (G. Parmigiani
et
al have produced MergeMaid, as one option). I have done some work on
this problem, with Wolfgang Huber and Markus Rauschaupt (you can find
the technical report under the Bioconductor publications link - I
hope).
It is not so simple to match across different arrays, where
different
probes were used (you can take the expedient of mapping to some common
set of IDs and matching on those, some code in packages GeneMeta and
GeneMetaEx, if I recall correctly), but just because they map to the
same Entrez gene id (for example) does not mean that the same thing
was
measured - whence MergeMaid and similar tools.
And if this is correct, then combining them is contra-indicated and
some of the tools for synthesizing experiments, such as meta-analysis
or
the more general random effects models will be needed. Just because
you
can jam, either the raw data or the processed data together, does not
mean that it is sensible to do so.
And finally, even if the arrays are identical, unless they were all
essentially done at the same time under very similar conditions I
would
still take the approach in the paragraph above and use a random
effects
model.
best wishes
Robert
Seth Falcon wrote:
> On 26 Dec 2005, kfbargad at ehu.es wrote:
>
>
>>Dear list,
>>
>>I have two sets of genes from the same experiment,
>>
>>
>>>PinC
>>
>>Expression Set (exprSet) with
>>1310 genes
>>8 samples
>>phenoData object with 2 variables and 8 cases
>>varLabels
>>FileName: read from file
>>Target: read from file
>>
>>>PinS
>>
>>Expression Set (exprSet) with
>>2891 genes
>>8 samples
>>phenoData object with 2 variables and 8 cases
>>varLabels
>>FileName: read from file
>>Target: read from file
>>
>>
>>How can I merge these two sets? I tried union() on two vectors
>>created from the probe IDs but failed. Any hints?
>
>
> One approach would be to create a new exprSet object manually using
> the data from PinC and PinS. Basically, create a new phenoData
object
> with the data for all 16 cases, and a new epxression matrix with 16
> columns (assuming the two original exprSets represent disjoint sets
of
> samples).
>
> Thinking out loud, is this a common enough operation to warrant a
> method for exprSets? I could imagine c() being defined on exprSets
> such that if the phenoData columns are the same and the "sample ids"
> as given by the rownames of phenoData/colnames of exprs are
disjoint,
> then do the obvious thing, else error.
>
> + seth
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
Dear Seth and Robert,
I apologise, but I didn?t make myself clear.
PinS and PinC come from the same experiment, i.e. the same eset. It is
just that I followed two different approaches to the analysis and now
I want to continue working with the union of these two lists. So I am
not intending to match across different arrays.
Hope this explains my question
David
> Hi,
> I think that the problem is that the arrays are not the same - and
> then life is much harder. There are some papers on it (G. Parmigiani
et
> al have produced MergeMaid, as one option). I have done some work on
> this problem, with Wolfgang Huber and Markus Rauschaupt (you can
find
> the technical report under the Bioconductor publications link - I
hope).
> It is not so simple to match across different arrays, where
different
> probes were used (you can take the expedient of mapping to some
common
> set of IDs and matching on those, some code in packages GeneMeta and
> GeneMetaEx, if I recall correctly), but just because they map to the
> same Entrez gene id (for example) does not mean that the same thing
was
> measured - whence MergeMaid and similar tools.
>
> And if this is correct, then combining them is contra-indicated
and
> some of the tools for synthesizing experiments, such as meta-
analysis or
> the more general random effects models will be needed. Just because
you
> can jam, either the raw data or the processed data together, does
not
> mean that it is sensible to do so.
>
> And finally, even if the arrays are identical, unless they were all
> essentially done at the same time under very similar conditions I
would
> still take the approach in the paragraph above and use a random
effects
> model.
>
> best wishes
> Robert
>
>
> Seth Falcon wrote:
> > On 26 Dec 2005, kfbargad at ehu.es wrote:
> >
> >
> >>Dear list,
> >>
> >>I have two sets of genes from the same experiment,
> >>
> >>
> >>>PinC
> >>
> >>Expression Set (exprSet) with
> >>1310 genes
> >>8 samples
> >>phenoData object with 2 variables and 8 cases
> >>varLabels
> >>FileName: read from file
> >>Target: read from file
> >>
> >>>PinS
> >>
> >>Expression Set (exprSet) with
> >>2891 genes
> >>8 samples
> >>phenoData object with 2 variables and 8 cases
> >>varLabels
> >>FileName: read from file
> >>Target: read from file
> >>
> >>
> >>How can I merge these two sets? I tried union() on two vectors
> >>created from the probe IDs but failed. Any hints?
> >
> >
> > One approach would be to create a new exprSet object manually
using
> > the data from PinC and PinS. Basically, create a new phenoData
object
> > with the data for all 16 cases, and a new epxression matrix with
16
> > columns (assuming the two original exprSets represent disjoint
sets of
> > samples).
> >
> > Thinking out loud, is this a common enough operation to warrant a
> > method for exprSets? I could imagine c() being defined on
exprSets
> > such that if the phenoData columns are the same and the "sample
ids"
> > as given by the rownames of phenoData/colnames of exprs are
disjoint,
> > then do the obvious thing, else error.
> >
> > + seth
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
>
> --
> Robert Gentleman, PhD
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> PO Box 19024
> Seattle, Washington 98109-1024
> 206-667-7700
> rgentlem at fhcrc.org
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
Hi,
thanks for the clarification. Then it depends on whether you want to
use the union or the intersection of the probes you selected in the
two
different ways.
union and intersect, applied to geneNames(PinS) and geneNames of PinC
should get you somewhere close, you might also want to consider match
and %in%, depending on just how you want to select.
After that, you will need to create a matrix with the combined
expressions and use that as input in a call to
new, the vignettes for Biobase should demonstrate how to make an
exprSet from a matrix, but please ask if anything is not clear
best wishes
Robert
kfbargad at ehu.es wrote:
> Dear Seth and Robert,
>
> I apologise, but I didn?t make myself clear.
>
> PinS and PinC come from the same experiment, i.e. the same eset. It
is
> just that I followed two different approaches to the analysis and
now
> I want to continue working with the union of these two lists. So I
am
> not intending to match across different arrays.
>
> Hope this explains my question
>
> David
>
>
>>Hi,
>> I think that the problem is that the arrays are not the same - and
>>then life is much harder. There are some papers on it (G. Parmigiani
>
> et
>
>>al have produced MergeMaid, as one option). I have done some work on
>>this problem, with Wolfgang Huber and Markus Rauschaupt (you can
>
> find
>
>>the technical report under the Bioconductor publications link - I
>
> hope).
>
>> It is not so simple to match across different arrays, where
>
> different
>
>>probes were used (you can take the expedient of mapping to some
>
> common
>
>>set of IDs and matching on those, some code in packages GeneMeta and
>>GeneMetaEx, if I recall correctly), but just because they map to the
>>same Entrez gene id (for example) does not mean that the same thing
>
> was
>
>>measured - whence MergeMaid and similar tools.
>>
>> And if this is correct, then combining them is contra-indicated
>
> and
>
>>some of the tools for synthesizing experiments, such as meta-
>
> analysis or
>
>>the more general random effects models will be needed. Just because
>
> you
>
>>can jam, either the raw data or the processed data together, does
>
> not
>
>>mean that it is sensible to do so.
>>
>>And finally, even if the arrays are identical, unless they were all
>>essentially done at the same time under very similar conditions I
>
> would
>
>>still take the approach in the paragraph above and use a random
>
> effects
>
>>model.
>>
>> best wishes
>> Robert
>>
>>
>>Seth Falcon wrote:
>>
>>>On 26 Dec 2005, kfbargad at ehu.es wrote:
>>>
>>>
>>>
>>>>Dear list,
>>>>
>>>>I have two sets of genes from the same experiment,
>>>>
>>>>
>>>>
>>>>>PinC
>>>>
>>>>Expression Set (exprSet) with
>>>>1310 genes
>>>>8 samples
>>>>phenoData object with 2 variables and 8 cases
>>>>varLabels
>>>>FileName: read from file
>>>>Target: read from file
>>>>
>>>>
>>>>>PinS
>>>>
>>>>Expression Set (exprSet) with
>>>>2891 genes
>>>>8 samples
>>>>phenoData object with 2 variables and 8 cases
>>>>varLabels
>>>>FileName: read from file
>>>>Target: read from file
>>>>
>>>>
>>>>How can I merge these two sets? I tried union() on two vectors
>>>>created from the probe IDs but failed. Any hints?
>>>
>>>
>>>One approach would be to create a new exprSet object manually using
>>>the data from PinC and PinS. Basically, create a new phenoData
>
> object
>
>>>with the data for all 16 cases, and a new epxression matrix with 16
>>>columns (assuming the two original exprSets represent disjoint
>
> sets of
>
>>>samples).
>>>
>>>Thinking out loud, is this a common enough operation to warrant a
>>>method for exprSets? I could imagine c() being defined on exprSets
>>>such that if the phenoData columns are the same and the "sample
>
> ids"
>
>>>as given by the rownames of phenoData/colnames of exprs are
>
> disjoint,
>
>>>then do the obvious thing, else error.
>>>
>>>+ seth
>>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>>Bioconductor at stat.math.ethz.ch
>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>
>>
>>--
>>Robert Gentleman, PhD
>>Program in Computational Biology
>>Division of Public Health Sciences
>>Fred Hutchinson Cancer Research Center
>>1100 Fairview Ave. N, M2-B876
>>PO Box 19024
>>Seattle, Washington 98109-1024
>>206-667-7700
>>rgentlem at fhcrc.org
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>
>
>
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
Dear Robert, thanks for your comments, they were very useful. I am
just practicing how to work with lists, comparing and merging, finding
genes in lists, etc.
If I have two lists, say A and B, and want to create a third one
containing the genes present in A but not in B, should I use the
vennDiagram() function to obtain this? maybe something similar to
[-grep(geneNames(B)]
would be useful?
I had a look at your suggestion of using %in% but couldn?t find much
information about how to use it
Any hints would be greatly appreciated,
Best
David
Hi, sorry, I think I found the answer to my previous email.
setdiff() will do the trick, right?
I also found the help for %in%, it was under ?'%in%'
best,
David
> Hi,
> thanks for the clarification. Then it depends on whether you want
to
> use the union or the intersection of the probes you selected in the
two
> different ways.
> union and intersect, applied to geneNames(PinS) and geneNames of
PinC
> should get you somewhere close, you might also want to consider
match
> and %in%, depending on just how you want to select.
> After that, you will need to create a matrix with the combined
> expressions and use that as input in a call to
> new, the vignettes for Biobase should demonstrate how to make an
> exprSet from a matrix, but please ask if anything is not clear
>
> best wishes
> Robert
>
> kfbargad at ehu.es wrote:
> > Dear Seth and Robert,
> >
> > I apologise, but I didn?t make myself clear.
> >
> > PinS and PinC come from the same experiment, i.e. the same eset.
It is
> > just that I followed two different approaches to the analysis and
now
> > I want to continue working with the union of these two lists. So I
am
> > not intending to match across different arrays.
> >
> > Hope this explains my question
> >
> > David
> >
> >
> >>Hi,
> >> I think that the problem is that the arrays are not the same -
and
> >>then life is much harder. There are some papers on it (G.
Parmigiani
> >
> > et
> >
> >>al have produced MergeMaid, as one option). I have done some work
on
> >>this problem, with Wolfgang Huber and Markus Rauschaupt (you can
> >
> > find
> >
> >>the technical report under the Bioconductor publications link - I
> >
> > hope).
> >
> >> It is not so simple to match across different arrays, where
> >
> > different
> >
> >>probes were used (you can take the expedient of mapping to some
> >
> > common
> >
> >>set of IDs and matching on those, some code in packages GeneMeta
and
> >>GeneMetaEx, if I recall correctly), but just because they map to
the
> >>same Entrez gene id (for example) does not mean that the same
thing
> >
> > was
> >
> >>measured - whence MergeMaid and similar tools.
> >>
> >> And if this is correct, then combining them is contra-indicated
> >
> > and
> >
> >>some of the tools for synthesizing experiments, such as meta-
> >
> > analysis or
> >
> >>the more general random effects models will be needed. Just
because
> >
> > you
> >
> >>can jam, either the raw data or the processed data together, does
> >
> > not
> >
> >>mean that it is sensible to do so.
> >>
> >>And finally, even if the arrays are identical, unless they were
all
> >>essentially done at the same time under very similar conditions I
> >
> > would
> >
> >>still take the approach in the paragraph above and use a random
> >
> > effects
> >
> >>model.
> >>
> >> best wishes
> >> Robert
> >>
> >>
> >>Seth Falcon wrote:
> >>
> >>>On 26 Dec 2005, kfbargad at ehu.es wrote:
> >>>
> >>>
> >>>
> >>>>Dear list,
> >>>>
> >>>>I have two sets of genes from the same experiment,
> >>>>
> >>>>
> >>>>
> >>>>>PinC
> >>>>
> >>>>Expression Set (exprSet) with
> >>>>1310 genes
> >>>>8 samples
> >>>>phenoData object with 2 variables and 8 cases
> >>>>varLabels
> >>>>FileName: read from file
> >>>>Target: read from file
> >>>>
> >>>>
> >>>>>PinS
> >>>>
> >>>>Expression Set (exprSet) with
> >>>>2891 genes
> >>>>8 samples
> >>>>phenoData object with 2 variables and 8 cases
> >>>>varLabels
> >>>>FileName: read from file
> >>>>Target: read from file
> >>>>
> >>>>
> >>>>How can I merge these two sets? I tried union() on two vectors
> >>>>created from the probe IDs but failed. Any hints?
> >>>
> >>>
> >>>One approach would be to create a new exprSet object manually
using
> >>>the data from PinC and PinS. Basically, create a new phenoData
> >
> > object
> >
> >>>with the data for all 16 cases, and a new epxression matrix with
16
> >>>columns (assuming the two original exprSets represent disjoint
> >
> > sets of
> >
> >>>samples).
> >>>
> >>>Thinking out loud, is this a common enough operation to warrant a
> >>>method for exprSets? I could imagine c() being defined on
exprSets
> >>>such that if the phenoData columns are the same and the "sample
> >
> > ids"
> >
> >>>as given by the rownames of phenoData/colnames of exprs are
> >
> > disjoint,
> >
> >>>then do the obvious thing, else error.
> >>>
> >>>+ seth
> >>>
> >>>_______________________________________________
> >>>Bioconductor mailing list
> >>>Bioconductor at stat.math.ethz.ch
> >>>https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>>
> >>
> >>--
> >>Robert Gentleman, PhD
> >>Program in Computational Biology
> >>Division of Public Health Sciences
> >>Fred Hutchinson Cancer Research Center
> >>1100 Fairview Ave. N, M2-B876
> >>PO Box 19024
> >>Seattle, Washington 98109-1024
> >>206-667-7700
> >>rgentlem at fhcrc.org
> >>
> >>_______________________________________________
> >>Bioconductor mailing list
> >>Bioconductor at stat.math.ethz.ch
> >>https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>
> >
> >
> >
> >
>
> --
> Robert Gentleman, PhD
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> PO Box 19024
> Seattle, Washington 98109-1024
> 206-667-7700
> rgentlem at fhcrc.org
>
On Dec 29, 2005, at 12:56 PM, kfbargad at ehu.es wrote:
> Hi, sorry, I think I found the answer to my previous email.
>
> setdiff() will do the trick, right?
Yes, although it will depend on how your list are structured since
setdiff works on vectors.
/Kasper
> I also found the help for %in%, it was under ?'%in%'
>
> best,
>
> David
>
>> Hi,
>> thanks for the clarification. Then it depends on whether you want
> to
>> use the union or the intersection of the probes you selected in the
> two
>> different ways.
>> union and intersect, applied to geneNames(PinS) and geneNames of
> PinC
>> should get you somewhere close, you might also want to consider
> match
>> and %in%, depending on just how you want to select.
>> After that, you will need to create a matrix with the combined
>> expressions and use that as input in a call to
>> new, the vignettes for Biobase should demonstrate how to make an
>> exprSet from a matrix, but please ask if anything is not clear
>>
>> best wishes
>> Robert
>>
>> kfbargad at ehu.es wrote:
>>> Dear Seth and Robert,
>>>
>>> I apologise, but I didn?t make myself clear.
>>>
>>> PinS and PinC come from the same experiment, i.e. the same eset.
> It is
>>> just that I followed two different approaches to the analysis and
> now
>>> I want to continue working with the union of these two lists. So I
> am
>>> not intending to match across different arrays.
>>>
>>> Hope this explains my question
>>>
>>> David
>>>
>>>
>>>> Hi,
>>>> I think that the problem is that the arrays are not the same -
> and
>>>> then life is much harder. There are some papers on it (G.
> Parmigiani
>>>
>>> et
>>>
>>>> al have produced MergeMaid, as one option). I have done some work
> on
>>>> this problem, with Wolfgang Huber and Markus Rauschaupt (you can
>>>
>>> find
>>>
>>>> the technical report under the Bioconductor publications link - I
>>>
>>> hope).
>>>
>>>> It is not so simple to match across different arrays, where
>>>
>>> different
>>>
>>>> probes were used (you can take the expedient of mapping to some
>>>
>>> common
>>>
>>>> set of IDs and matching on those, some code in packages GeneMeta
> and
>>>> GeneMetaEx, if I recall correctly), but just because they map to
> the
>>>> same Entrez gene id (for example) does not mean that the same
> thing
>>>
>>> was
>>>
>>>> measured - whence MergeMaid and similar tools.
>>>>
>>>> And if this is correct, then combining them is contra-indicated
>>>
>>> and
>>>
>>>> some of the tools for synthesizing experiments, such as meta-
>>>
>>> analysis or
>>>
>>>> the more general random effects models will be needed. Just
> because
>>>
>>> you
>>>
>>>> can jam, either the raw data or the processed data together, does
>>>
>>> not
>>>
>>>> mean that it is sensible to do so.
>>>>
>>>> And finally, even if the arrays are identical, unless they were
> all
>>>> essentially done at the same time under very similar conditions I
>>>
>>> would
>>>
>>>> still take the approach in the paragraph above and use a random
>>>
>>> effects
>>>
>>>> model.
>>>>
>>>> best wishes
>>>> Robert
>>>>
>>>>
>>>> Seth Falcon wrote:
>>>>
>>>>> On 26 Dec 2005, kfbargad at ehu.es wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Dear list,
>>>>>>
>>>>>> I have two sets of genes from the same experiment,
>>>>>>
>>>>>>
>>>>>>
>>>>>>> PinC
>>>>>>
>>>>>> Expression Set (exprSet) with
>>>>>> 1310 genes
>>>>>> 8 samples
>>>>>> phenoData object with 2 variables and 8 cases
>>>>>> varLabels
>>>>>> FileName: read from file
>>>>>> Target: read from file
>>>>>>
>>>>>>
>>>>>>> PinS
>>>>>>
>>>>>> Expression Set (exprSet) with
>>>>>> 2891 genes
>>>>>> 8 samples
>>>>>> phenoData object with 2 variables and 8 cases
>>>>>> varLabels
>>>>>> FileName: read from file
>>>>>> Target: read from file
>>>>>>
>>>>>>
>>>>>> How can I merge these two sets? I tried union() on two vectors
>>>>>> created from the probe IDs but failed. Any hints?
>>>>>
>>>>>
>>>>> One approach would be to create a new exprSet object manually
> using
>>>>> the data from PinC and PinS. Basically, create a new phenoData
>>>
>>> object
>>>
>>>>> with the data for all 16 cases, and a new epxression matrix with
> 16
>>>>> columns (assuming the two original exprSets represent disjoint
>>>
>>> sets of
>>>
>>>>> samples).
>>>>>
>>>>> Thinking out loud, is this a common enough operation to warrant
a
>>>>> method for exprSets? I could imagine c() being defined on
> exprSets
>>>>> such that if the phenoData columns are the same and the "sample
>>>
>>> ids"
>>>
>>>>> as given by the rownames of phenoData/colnames of exprs are
>>>
>>> disjoint,
>>>
>>>>> then do the obvious thing, else error.
>>>>>
>>>>> + seth
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>
>>>>
>>>> --
>>>> Robert Gentleman, PhD
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M2-B876
>>>> PO Box 19024
>>>> Seattle, Washington 98109-1024
>>>> 206-667-7700
>>>> rgentlem at fhcrc.org
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Robert Gentleman, PhD
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> PO Box 19024
>> Seattle, Washington 98109-1024
>> 206-667-7700
>> rgentlem at fhcrc.org
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor