merging two sets of genes

0

Entering edit mode

Seth Falcon ★ 7.4k

@seth-falcon-992

Last seen 11.3 years ago

On 26 Dec 2005, kfbargad at ehu.es wrote: > Dear list, > > I have two sets of genes from the same experiment, > >> PinC > Expression Set (exprSet) with > 1310 genes > 8 samples > phenoData object with 2 variables and 8 cases > varLabels > FileName: read from file > Target: read from file >> PinS > Expression Set (exprSet) with > 2891 genes > 8 samples > phenoData object with 2 variables and 8 cases > varLabels > FileName: read from file > Target: read from file > > > How can I merge these two sets? I tried union() on two vectors > created from the probe IDs but failed. Any hints? One approach would be to create a new exprSet object manually using the data from PinC and PinS. Basically, create a new phenoData object with the data for all 16 cases, and a new epxression matrix with 16 columns (assuming the two original exprSets represent disjoint sets of samples). Thinking out loud, is this a common enough operation to warrant a method for exprSets? I could imagine c() being defined on exprSets such that if the phenoData columns are the same and the "sample ids" as given by the rownames of phenoData/colnames of exprs are disjoint, then do the obvious thing, else error. + seth

probe probe • 1.5k views

ADD COMMENT • link updated 20.0 years ago by rgentleman ★ 5.5k • written 20.0 years ago by Seth Falcon ★ 7.4k

0

Entering edit mode

rgentleman ★ 5.5k

@rgentleman-7725

Last seen 10.7 years ago

United States

Hi, I think that the problem is that the arrays are not the same - and then life is much harder. There are some papers on it (G. Parmigiani et al have produced MergeMaid, as one option). I have done some work on this problem, with Wolfgang Huber and Markus Rauschaupt (you can find the technical report under the Bioconductor publications link - I hope). It is not so simple to match across different arrays, where different probes were used (you can take the expedient of mapping to some common set of IDs and matching on those, some code in packages GeneMeta and GeneMetaEx, if I recall correctly), but just because they map to the same Entrez gene id (for example) does not mean that the same thing was measured - whence MergeMaid and similar tools. And if this is correct, then combining them is contra-indicated and some of the tools for synthesizing experiments, such as meta-analysis or the more general random effects models will be needed. Just because you can jam, either the raw data or the processed data together, does not mean that it is sensible to do so. And finally, even if the arrays are identical, unless they were all essentially done at the same time under very similar conditions I would still take the approach in the paragraph above and use a random effects model. best wishes Robert Seth Falcon wrote: > On 26 Dec 2005, kfbargad at ehu.es wrote: > > >>Dear list, >> >>I have two sets of genes from the same experiment, >> >> >>>PinC >> >>Expression Set (exprSet) with >>1310 genes >>8 samples >>phenoData object with 2 variables and 8 cases >>varLabels >>FileName: read from file >>Target: read from file >> >>>PinS >> >>Expression Set (exprSet) with >>2891 genes >>8 samples >>phenoData object with 2 variables and 8 cases >>varLabels >>FileName: read from file >>Target: read from file >> >> >>How can I merge these two sets? I tried union() on two vectors >>created from the probe IDs but failed. Any hints? > > > One approach would be to create a new exprSet object manually using > the data from PinC and PinS. Basically, create a new phenoData object > with the data for all 16 cases, and a new epxression matrix with 16 > columns (assuming the two original exprSets represent disjoint sets of > samples). > > Thinking out loud, is this a common enough operation to warrant a > method for exprSets? I could imagine c() being defined on exprSets > such that if the phenoData columns are the same and the "sample ids" > as given by the rownames of phenoData/colnames of exprs are disjoint, > then do the obvious thing, else error. > > + seth > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD COMMENT • link 20.0 years ago rgentleman ★ 5.5k

0

Entering edit mode

Dear Seth and Robert, I apologise, but I didn?t make myself clear. PinS and PinC come from the same experiment, i.e. the same eset. It is just that I followed two different approaches to the analysis and now I want to continue working with the union of these two lists. So I am not intending to match across different arrays. Hope this explains my question David > Hi, > I think that the problem is that the arrays are not the same - and > then life is much harder. There are some papers on it (G. Parmigiani et > al have produced MergeMaid, as one option). I have done some work on > this problem, with Wolfgang Huber and Markus Rauschaupt (you can find > the technical report under the Bioconductor publications link - I hope). > It is not so simple to match across different arrays, where different > probes were used (you can take the expedient of mapping to some common > set of IDs and matching on those, some code in packages GeneMeta and > GeneMetaEx, if I recall correctly), but just because they map to the > same Entrez gene id (for example) does not mean that the same thing was > measured - whence MergeMaid and similar tools. > > And if this is correct, then combining them is contra-indicated and > some of the tools for synthesizing experiments, such as meta- analysis or > the more general random effects models will be needed. Just because you > can jam, either the raw data or the processed data together, does not > mean that it is sensible to do so. > > And finally, even if the arrays are identical, unless they were all > essentially done at the same time under very similar conditions I would > still take the approach in the paragraph above and use a random effects > model. > > best wishes > Robert > > > Seth Falcon wrote: > > On 26 Dec 2005, kfbargad at ehu.es wrote: > > > > > >>Dear list, > >> > >>I have two sets of genes from the same experiment, > >> > >> > >>>PinC > >> > >>Expression Set (exprSet) with > >>1310 genes > >>8 samples > >>phenoData object with 2 variables and 8 cases > >>varLabels > >>FileName: read from file > >>Target: read from file > >> > >>>PinS > >> > >>Expression Set (exprSet) with > >>2891 genes > >>8 samples > >>phenoData object with 2 variables and 8 cases > >>varLabels > >>FileName: read from file > >>Target: read from file > >> > >> > >>How can I merge these two sets? I tried union() on two vectors > >>created from the probe IDs but failed. Any hints? > > > > > > One approach would be to create a new exprSet object manually using > > the data from PinC and PinS. Basically, create a new phenoData object > > with the data for all 16 cases, and a new epxression matrix with 16 > > columns (assuming the two original exprSets represent disjoint sets of > > samples). > > > > Thinking out loud, is this a common enough operation to warrant a > > method for exprSets? I could imagine c() being defined on exprSets > > such that if the phenoData columns are the same and the "sample ids" > > as given by the rownames of phenoData/colnames of exprs are disjoint, > > then do the obvious thing, else error. > > > > + seth > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > -- > Robert Gentleman, PhD > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > PO Box 19024 > Seattle, Washington 98109-1024 > 206-667-7700 > rgentlem at fhcrc.org > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 20.0 years ago kfbargad@ehu.es ▴ 270

0

Entering edit mode

rgentleman ★ 5.5k

@rgentleman-7725

Last seen 10.7 years ago

United States

Hi, thanks for the clarification. Then it depends on whether you want to use the union or the intersection of the probes you selected in the two different ways. union and intersect, applied to geneNames(PinS) and geneNames of PinC should get you somewhere close, you might also want to consider match and %in%, depending on just how you want to select. After that, you will need to create a matrix with the combined expressions and use that as input in a call to new, the vignettes for Biobase should demonstrate how to make an exprSet from a matrix, but please ask if anything is not clear best wishes Robert kfbargad at ehu.es wrote: > Dear Seth and Robert, > > I apologise, but I didn?t make myself clear. > > PinS and PinC come from the same experiment, i.e. the same eset. It is > just that I followed two different approaches to the analysis and now > I want to continue working with the union of these two lists. So I am > not intending to match across different arrays. > > Hope this explains my question > > David > > >>Hi, >> I think that the problem is that the arrays are not the same - and >>then life is much harder. There are some papers on it (G. Parmigiani > > et > >>al have produced MergeMaid, as one option). I have done some work on >>this problem, with Wolfgang Huber and Markus Rauschaupt (you can > > find > >>the technical report under the Bioconductor publications link - I > > hope). > >> It is not so simple to match across different arrays, where > > different > >>probes were used (you can take the expedient of mapping to some > > common > >>set of IDs and matching on those, some code in packages GeneMeta and >>GeneMetaEx, if I recall correctly), but just because they map to the >>same Entrez gene id (for example) does not mean that the same thing > > was > >>measured - whence MergeMaid and similar tools. >> >> And if this is correct, then combining them is contra-indicated > > and > >>some of the tools for synthesizing experiments, such as meta- > > analysis or > >>the more general random effects models will be needed. Just because > > you > >>can jam, either the raw data or the processed data together, does > > not > >>mean that it is sensible to do so. >> >>And finally, even if the arrays are identical, unless they were all >>essentially done at the same time under very similar conditions I > > would > >>still take the approach in the paragraph above and use a random > > effects > >>model. >> >> best wishes >> Robert >> >> >>Seth Falcon wrote: >> >>>On 26 Dec 2005, kfbargad at ehu.es wrote: >>> >>> >>> >>>>Dear list, >>>> >>>>I have two sets of genes from the same experiment, >>>> >>>> >>>> >>>>>PinC >>>> >>>>Expression Set (exprSet) with >>>>1310 genes >>>>8 samples >>>>phenoData object with 2 variables and 8 cases >>>>varLabels >>>>FileName: read from file >>>>Target: read from file >>>> >>>> >>>>>PinS >>>> >>>>Expression Set (exprSet) with >>>>2891 genes >>>>8 samples >>>>phenoData object with 2 variables and 8 cases >>>>varLabels >>>>FileName: read from file >>>>Target: read from file >>>> >>>> >>>>How can I merge these two sets? I tried union() on two vectors >>>>created from the probe IDs but failed. Any hints? >>> >>> >>>One approach would be to create a new exprSet object manually using >>>the data from PinC and PinS. Basically, create a new phenoData > > object > >>>with the data for all 16 cases, and a new epxression matrix with 16 >>>columns (assuming the two original exprSets represent disjoint > > sets of > >>>samples). >>> >>>Thinking out loud, is this a common enough operation to warrant a >>>method for exprSets? I could imagine c() being defined on exprSets >>>such that if the phenoData columns are the same and the "sample > > ids" > >>>as given by the rownames of phenoData/colnames of exprs are > > disjoint, > >>>then do the obvious thing, else error. >>> >>>+ seth >>> >>>_______________________________________________ >>>Bioconductor mailing list >>>Bioconductor at stat.math.ethz.ch >>>https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >> >>-- >>Robert Gentleman, PhD >>Program in Computational Biology >>Division of Public Health Sciences >>Fred Hutchinson Cancer Research Center >>1100 Fairview Ave. N, M2-B876 >>PO Box 19024 >>Seattle, Washington 98109-1024 >>206-667-7700 >>rgentlem at fhcrc.org >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >> > > > > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD COMMENT • link 20.0 years ago rgentleman ★ 5.5k

0

Entering edit mode

Dear Robert, thanks for your comments, they were very useful. I am just practicing how to work with lists, comparing and merging, finding genes in lists, etc. If I have two lists, say A and B, and want to create a third one containing the genes present in A but not in B, should I use the vennDiagram() function to obtain this? maybe something similar to [-grep(geneNames(B)] would be useful? I had a look at your suggestion of using %in% but couldn?t find much information about how to use it Any hints would be greatly appreciated, Best David

ADD REPLY • link 20.0 years ago kfbargad@ehu.es ▴ 270

0

Entering edit mode

Hi, sorry, I think I found the answer to my previous email. setdiff() will do the trick, right? I also found the help for %in%, it was under ?'%in%' best, David > Hi, > thanks for the clarification. Then it depends on whether you want to > use the union or the intersection of the probes you selected in the two > different ways. > union and intersect, applied to geneNames(PinS) and geneNames of PinC > should get you somewhere close, you might also want to consider match > and %in%, depending on just how you want to select. > After that, you will need to create a matrix with the combined > expressions and use that as input in a call to > new, the vignettes for Biobase should demonstrate how to make an > exprSet from a matrix, but please ask if anything is not clear > > best wishes > Robert > > kfbargad at ehu.es wrote: > > Dear Seth and Robert, > > > > I apologise, but I didn?t make myself clear. > > > > PinS and PinC come from the same experiment, i.e. the same eset. It is > > just that I followed two different approaches to the analysis and now > > I want to continue working with the union of these two lists. So I am > > not intending to match across different arrays. > > > > Hope this explains my question > > > > David > > > > > >>Hi, > >> I think that the problem is that the arrays are not the same - and > >>then life is much harder. There are some papers on it (G. Parmigiani > > > > et > > > >>al have produced MergeMaid, as one option). I have done some work on > >>this problem, with Wolfgang Huber and Markus Rauschaupt (you can > > > > find > > > >>the technical report under the Bioconductor publications link - I > > > > hope). > > > >> It is not so simple to match across different arrays, where > > > > different > > > >>probes were used (you can take the expedient of mapping to some > > > > common > > > >>set of IDs and matching on those, some code in packages GeneMeta and > >>GeneMetaEx, if I recall correctly), but just because they map to the > >>same Entrez gene id (for example) does not mean that the same thing > > > > was > > > >>measured - whence MergeMaid and similar tools. > >> > >> And if this is correct, then combining them is contra-indicated > > > > and > > > >>some of the tools for synthesizing experiments, such as meta- > > > > analysis or > > > >>the more general random effects models will be needed. Just because > > > > you > > > >>can jam, either the raw data or the processed data together, does > > > > not > > > >>mean that it is sensible to do so. > >> > >>And finally, even if the arrays are identical, unless they were all > >>essentially done at the same time under very similar conditions I > > > > would > > > >>still take the approach in the paragraph above and use a random > > > > effects > > > >>model. > >> > >> best wishes > >> Robert > >> > >> > >>Seth Falcon wrote: > >> > >>>On 26 Dec 2005, kfbargad at ehu.es wrote: > >>> > >>> > >>> > >>>>Dear list, > >>>> > >>>>I have two sets of genes from the same experiment, > >>>> > >>>> > >>>> > >>>>>PinC > >>>> > >>>>Expression Set (exprSet) with > >>>>1310 genes > >>>>8 samples > >>>>phenoData object with 2 variables and 8 cases > >>>>varLabels > >>>>FileName: read from file > >>>>Target: read from file > >>>> > >>>> > >>>>>PinS > >>>> > >>>>Expression Set (exprSet) with > >>>>2891 genes > >>>>8 samples > >>>>phenoData object with 2 variables and 8 cases > >>>>varLabels > >>>>FileName: read from file > >>>>Target: read from file > >>>> > >>>> > >>>>How can I merge these two sets? I tried union() on two vectors > >>>>created from the probe IDs but failed. Any hints? > >>> > >>> > >>>One approach would be to create a new exprSet object manually using > >>>the data from PinC and PinS. Basically, create a new phenoData > > > > object > > > >>>with the data for all 16 cases, and a new epxression matrix with 16 > >>>columns (assuming the two original exprSets represent disjoint > > > > sets of > > > >>>samples). > >>> > >>>Thinking out loud, is this a common enough operation to warrant a > >>>method for exprSets? I could imagine c() being defined on exprSets > >>>such that if the phenoData columns are the same and the "sample > > > > ids" > > > >>>as given by the rownames of phenoData/colnames of exprs are > > > > disjoint, > > > >>>then do the obvious thing, else error. > >>> > >>>+ seth > >>> > >>>_______________________________________________ > >>>Bioconductor mailing list > >>>Bioconductor at stat.math.ethz.ch > >>>https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> > >> > >>-- > >>Robert Gentleman, PhD > >>Program in Computational Biology > >>Division of Public Health Sciences > >>Fred Hutchinson Cancer Research Center > >>1100 Fairview Ave. N, M2-B876 > >>PO Box 19024 > >>Seattle, Washington 98109-1024 > >>206-667-7700 > >>rgentlem at fhcrc.org > >> > >>_______________________________________________ > >>Bioconductor mailing list > >>Bioconductor at stat.math.ethz.ch > >>https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > > > > > > > > > > -- > Robert Gentleman, PhD > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > PO Box 19024 > Seattle, Washington 98109-1024 > 206-667-7700 > rgentlem at fhcrc.org >

ADD REPLY • link 20.0 years ago kfbargad@ehu.es ▴ 270

0

Entering edit mode

On Dec 29, 2005, at 12:56 PM, kfbargad at ehu.es wrote: > Hi, sorry, I think I found the answer to my previous email. > > setdiff() will do the trick, right? Yes, although it will depend on how your list are structured since setdiff works on vectors. /Kasper > I also found the help for %in%, it was under ?'%in%' > > best, > > David > >> Hi, >> thanks for the clarification. Then it depends on whether you want > to >> use the union or the intersection of the probes you selected in the > two >> different ways. >> union and intersect, applied to geneNames(PinS) and geneNames of > PinC >> should get you somewhere close, you might also want to consider > match >> and %in%, depending on just how you want to select. >> After that, you will need to create a matrix with the combined >> expressions and use that as input in a call to >> new, the vignettes for Biobase should demonstrate how to make an >> exprSet from a matrix, but please ask if anything is not clear >> >> best wishes >> Robert >> >> kfbargad at ehu.es wrote: >>> Dear Seth and Robert, >>> >>> I apologise, but I didn?t make myself clear. >>> >>> PinS and PinC come from the same experiment, i.e. the same eset. > It is >>> just that I followed two different approaches to the analysis and > now >>> I want to continue working with the union of these two lists. So I > am >>> not intending to match across different arrays. >>> >>> Hope this explains my question >>> >>> David >>> >>> >>>> Hi, >>>> I think that the problem is that the arrays are not the same - > and >>>> then life is much harder. There are some papers on it (G. > Parmigiani >>> >>> et >>> >>>> al have produced MergeMaid, as one option). I have done some work > on >>>> this problem, with Wolfgang Huber and Markus Rauschaupt (you can >>> >>> find >>> >>>> the technical report under the Bioconductor publications link - I >>> >>> hope). >>> >>>> It is not so simple to match across different arrays, where >>> >>> different >>> >>>> probes were used (you can take the expedient of mapping to some >>> >>> common >>> >>>> set of IDs and matching on those, some code in packages GeneMeta > and >>>> GeneMetaEx, if I recall correctly), but just because they map to > the >>>> same Entrez gene id (for example) does not mean that the same > thing >>> >>> was >>> >>>> measured - whence MergeMaid and similar tools. >>>> >>>> And if this is correct, then combining them is contra-indicated >>> >>> and >>> >>>> some of the tools for synthesizing experiments, such as meta- >>> >>> analysis or >>> >>>> the more general random effects models will be needed. Just > because >>> >>> you >>> >>>> can jam, either the raw data or the processed data together, does >>> >>> not >>> >>>> mean that it is sensible to do so. >>>> >>>> And finally, even if the arrays are identical, unless they were > all >>>> essentially done at the same time under very similar conditions I >>> >>> would >>> >>>> still take the approach in the paragraph above and use a random >>> >>> effects >>> >>>> model. >>>> >>>> best wishes >>>> Robert >>>> >>>> >>>> Seth Falcon wrote: >>>> >>>>> On 26 Dec 2005, kfbargad at ehu.es wrote: >>>>> >>>>> >>>>> >>>>>> Dear list, >>>>>> >>>>>> I have two sets of genes from the same experiment, >>>>>> >>>>>> >>>>>> >>>>>>> PinC >>>>>> >>>>>> Expression Set (exprSet) with >>>>>> 1310 genes >>>>>> 8 samples >>>>>> phenoData object with 2 variables and 8 cases >>>>>> varLabels >>>>>> FileName: read from file >>>>>> Target: read from file >>>>>> >>>>>> >>>>>>> PinS >>>>>> >>>>>> Expression Set (exprSet) with >>>>>> 2891 genes >>>>>> 8 samples >>>>>> phenoData object with 2 variables and 8 cases >>>>>> varLabels >>>>>> FileName: read from file >>>>>> Target: read from file >>>>>> >>>>>> >>>>>> How can I merge these two sets? I tried union() on two vectors >>>>>> created from the probe IDs but failed. Any hints? >>>>> >>>>> >>>>> One approach would be to create a new exprSet object manually > using >>>>> the data from PinC and PinS. Basically, create a new phenoData >>> >>> object >>> >>>>> with the data for all 16 cases, and a new epxression matrix with > 16 >>>>> columns (assuming the two original exprSets represent disjoint >>> >>> sets of >>> >>>>> samples). >>>>> >>>>> Thinking out loud, is this a common enough operation to warrant a >>>>> method for exprSets? I could imagine c() being defined on > exprSets >>>>> such that if the phenoData columns are the same and the "sample >>> >>> ids" >>> >>>>> as given by the rownames of phenoData/colnames of exprs are >>> >>> disjoint, >>> >>>>> then do the obvious thing, else error. >>>>> >>>>> + seth >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> >>>> >>>> -- >>>> Robert Gentleman, PhD >>>> Program in Computational Biology >>>> Division of Public Health Sciences >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N, M2-B876 >>>> PO Box 19024 >>>> Seattle, Washington 98109-1024 >>>> 206-667-7700 >>>> rgentlem at fhcrc.org >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> >>> >>> >>> >>> >> >> -- >> Robert Gentleman, PhD >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M2-B876 >> PO Box 19024 >> Seattle, Washington 98109-1024 >> 206-667-7700 >> rgentlem at fhcrc.org >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD REPLY • link 20.0 years ago Kasper Daniel Hansen ★ 6.5k

Login before adding your answer.