subsetting the genes for cluster

0

Entering edit mode

Abhilash Venu ▴ 340

@abhilash-venu-2680

Last seen 9.6 years ago

Hi all, I am working on a single color expression data using limma. I would like to perform a cluster analysis after selecting the differentially genes based on the P value (say 0.001). As far as my knowledge is concerned I have to do the sub setting of these selected genes on the normalized data (MA), to retrieve the distribution across the samples. But I am wondering whether I can perform using the R script? I would appreciate any help. -- Regards, Abhilash [[alternative HTML version deleted]]

limma limma • 1.2k views

ADD COMMENT • link updated 15.6 years ago by Mark Cowley ▴ 910 • written 15.7 years ago by Abhilash Venu ▴ 340

0

Entering edit mode

Abhilash Venu ▴ 340

@abhilash-venu-2680

Last seen 9.6 years ago

On Thu, Sep 4, 2008 at 5:21 AM, Mark Cowley <m.cowley@garvan.org.au> wrote: > Hi Abhilash, > > On 02/09/2008, at 11:09 PM, Abhilash Venu wrote: > > Hi all, >> >> I am working on a single color expression data using limma. I would like >> to >> perform a cluster analysis after selecting the differentially genes based >> on >> the P value (say 0.001). As far as my knowledge is concerned I have to do >> the sub setting of these selected genes on the normalized data (MA), to >> retrieve the distribution across the samples. >> > That's correct > Thank you Mark, But I am quite cinfused here. Because our colaborator has > already performed single color in agilent platform, when I had performed > cluster using the same method as I mentioned the color key has given > positive values (as all the values are positive, if I chose values from MA). > Our collaborator feels that this scenario is quite unusual because the green > color usually represents down regulation. Could you suggest, how I should go > about it? > >> >> But I am wondering whether I can perform using the R script? >> > Can you elaborate on "using the R script"I was not sure about the R script > for subsetting, so I performed using python. > >> >> I would appreciate any help. >> > You need 2 things: the names of the DE genes, and the normalised data. > Get the DE genes from your toptable, and the normalised data from within > your MA object (hint: names(MA) ). > Then sub-set the normalised data to just those rows from the DE genes, then > perform cluster analysis. There are large number of ways of doing this. To > get you started, have a look at heatmap.2 from the package gplots. > others include the built in > hclust( dist( yourDEdata ) ) > > cheers, > Mark > > ----------------------------------------------------- > Mark Cowley, BSc (Bioinformatics)(Hons) > > Peter Wills Bioinformatics Centre > Garvan Institute of Medical Research, Sydney, Australia > ----------------------------------------------------- > > -- Regards, Abhilash [[alternative HTML version deleted]]

ADD COMMENT • link 15.6 years ago Abhilash Venu ▴ 340

0

Entering edit mode

On Thu, Sep 4, 2008 at 10:59 AM, Abhilash Venu <abhivenu at="" gmail.com=""> wrote: > On Thu, Sep 4, 2008 at 5:21 AM, Mark Cowley <m.cowley at="" garvan.org.au=""> wrote: > >> Hi Abhilash, >> >> On 02/09/2008, at 11:09 PM, Abhilash Venu wrote: >> >> Hi all, >>> >>> I am working on a single color expression data using limma. I would like >>> to >>> perform a cluster analysis after selecting the differentially genes based >>> on >>> the P value (say 0.001). As far as my knowledge is concerned I have to do >>> the sub setting of these selected genes on the normalized data (MA), to >>> retrieve the distribution across the samples. >>> >> That's correct > > > >> Thank you Mark, But I am quite cinfused here. Because our colaborator has >> already performed single color in agilent platform, when I had performed >> cluster using the same method as I mentioned the color key has given >> positive values (as all the values are positive, if I chose values from MA). >> Our collaborator feels that this scenario is quite unusual because the green >> color usually represents down regulation. Could you suggest, how I should go >> about it? Did you use heatmap.2 to do the heatmap? If so, there is an argument "scale" that might be useful. For ALL functions that are new, I would advise reading the whole help page, as there is often very useful information there. >>> >>> But I am wondering whether I can perform using the R script? >>> >> Can you elaborate on "using the R script"I was not sure about the R script >> for subsetting, so I performed using python. You can try help.search('subset'), as a start. RSiteSearch is also useful for searching for answers. You will likely benefit from reading: http://cran.r-project.org/doc/manuals/R-intro.html And potentially from: http://biostat-09.berkeley.edu/~bullard/courses/T-berkeley-08/resource s/R_intro_easy.pdf >>> >>> I would appreciate any help. >>> >> You need 2 things: the names of the DE genes, and the normalised data. >> Get the DE genes from your toptable, and the normalised data from within >> your MA object (hint: names(MA) ). >> Then sub-set the normalised data to just those rows from the DE genes, then >> perform cluster analysis. There are large number of ways of doing this. To >> get you started, have a look at heatmap.2 from the package gplots. >> others include the built in >> hclust( dist( yourDEdata ) ) >> >> cheers, >> Mark >> >> ----------------------------------------------------- >> Mark Cowley, BSc (Bioinformatics)(Hons) >> >> Peter Wills Bioinformatics Centre >> Garvan Institute of Medical Research, Sydney, Australia >> ----------------------------------------------------- >> >> > > > -- > > Regards, > Abhilash > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 15.6 years ago Sean Davis 21k

0

Entering edit mode

On 05/09/2008, at 5:06 AM, Sean Davis wrote: > On Thu, Sep 4, 2008 at 10:59 AM, Abhilash Venu <abhivenu at="" gmail.com=""> > wrote: >> On Thu, Sep 4, 2008 at 5:21 AM, Mark Cowley >> <m.cowley at="" garvan.org.au=""> wrote: >> >>> Hi Abhilash, >>> >>> On 02/09/2008, at 11:09 PM, Abhilash Venu wrote: >>> >>> Hi all, >>>> >>>> I am working on a single color expression data using limma. I >>>> would like >>>> to >>>> perform a cluster analysis after selecting the differentially >>>> genes based >>>> on >>>> the P value (say 0.001). As far as my knowledge is concerned I >>>> have to do >>>> the sub setting of these selected genes on the normalized data >>>> (MA), to >>>> retrieve the distribution across the samples. >>>> >>> That's correct >> >> >> >>> Thank you Mark, But I am quite cinfused here. Because our >>> colaborator has >>> already performed single color in agilent platform, when I had >>> performed >>> cluster using the same method as I mentioned the color key has given >>> positive values (as all the values are positive, if I chose values >>> from MA). >>> Our collaborator feels that this scenario is quite unusual because >>> the green >>> color usually represents down regulation. Could you suggest, how I >>> should go >>> about it? Part of this confusion stems from your non-standard use of 'MA' (I've checked your past posts to work this out), since 'MA' implies two- colour data, where the M-values, which are the ratios are the quantity of interest. You are dealing with single colour data, so I assume that in your use of MA you need to be referring to the A-values, but i'm not sure how limma deals with this in the way that you have used it. My clear preference is when you are dealing with single colour data is not to use 2-colour data objects. However, I assume that you have been able to identify and subset this data in order to have sent your previous reply to the list, so lets move on. back to your confusion: your collaborator is right. the vast majority of clustering is used to show RELATIVE expression, not absolute expression. If you 'mean correct' your absolute expression data, you will convert it to ratios, and then the heatmap.2 might give you a sensible picture. I agree with Sean (which I seem to be doing a lot recently) in that you need to improve your basic R usage, and the links that Sean provided are a great place to start, as is R for beginners by Paradis. cheers, Mark >>> > > Did you use heatmap.2 to do the heatmap? If so, there is an argument > "scale" that might be useful. For ALL functions that are new, I would > advise reading the whole help page, as there is often very useful > information there. > >>>> >>>> But I am wondering whether I can perform using the R script? >>>> >>> Can you elaborate on "using the R script"I was not sure about the >>> R script >>> for subsetting, so I performed using python. > > You can try help.search('subset'), as a start. RSiteSearch is also > useful for searching for answers. > > You will likely benefit from reading: > > http://cran.r-project.org/doc/manuals/R-intro.html > > And potentially from: > > http://biostat-09.berkeley.edu/~bullard/courses/T-berkeley-08/resour ces/R_intro_easy.pdf > >>>> >>>> I would appreciate any help. >>>> >>> You need 2 things: the names of the DE genes, and the normalised >>> data. >>> Get the DE genes from your toptable, and the normalised data from >>> within >>> your MA object (hint: names(MA) ). >>> Then sub-set the normalised data to just those rows from the DE >>> genes, then >>> perform cluster analysis. There are large number of ways of doing >>> this. To >>> get you started, have a look at heatmap.2 from the package gplots. >>> others include the built in >>> hclust( dist( yourDEdata ) ) >>> >>> cheers, >>> Mark >>> >>> ----------------------------------------------------- >>> Mark Cowley, BSc (Bioinformatics)(Hons) >>> >>> Peter Wills Bioinformatics Centre >>> Garvan Institute of Medical Research, Sydney, Australia >>> ----------------------------------------------------- >>> >>> >> >> >> -- >> >> Regards, >> Abhilash >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>

ADD REPLY • link 15.6 years ago Mark Cowley ▴ 910

0

Entering edit mode

Thanks Mark and Sean. It is working fine. Abhilash On Fri, Sep 5, 2008 at 9:37 AM, Mark Cowley <m.cowley@garvan.org.au> wrote: > > On 05/09/2008, at 5:06 AM, Sean Davis wrote: > >> On Thu, Sep 4, 2008 at 10:59 AM, Abhilash Venu <abhivenu@gmail.com> >> wrote: >> >>> On Thu, Sep 4, 2008 at 5:21 AM, Mark Cowley <m.cowley@garvan.org.au> >>> wrote: >>> >>> Hi Abhilash, >>>> >>>> On 02/09/2008, at 11:09 PM, Abhilash Venu wrote: >>>> >>>> Hi all, >>>> >>>>> >>>>> I am working on a single color expression data using limma. I would >>>>> like >>>>> to >>>>> perform a cluster analysis after selecting the differentially genes >>>>> based >>>>> on >>>>> the P value (say 0.001). As far as my knowledge is concerned I have to >>>>> do >>>>> the sub setting of these selected genes on the normalized data (MA), to >>>>> retrieve the distribution across the samples. >>>>> >>>>> That's correct >>>> >>> >>> >>> >>> Thank you Mark, But I am quite cinfused here. Because our colaborator >>>> has >>>> already performed single color in agilent platform, when I had performed >>>> cluster using the same method as I mentioned the color key has given >>>> positive values (as all the values are positive, if I chose values from >>>> MA). >>>> Our collaborator feels that this scenario is quite unusual because the >>>> green >>>> color usually represents down regulation. Could you suggest, how I >>>> should go >>>> about it? >>>> >>> Part of this confusion stems from your non-standard use of 'MA' (I've > checked your past posts to work this out), since 'MA' implies two- colour > data, where the M-values, which are the ratios are the quantity of interest. > You are dealing with single colour data, so I assume that in your use of MA > you need to be referring to the A-values, but i'm not sure how limma deals > with this in the way that you have used it. My clear preference is when you > are dealing with single colour data is not to use 2-colour data objects. > However, I assume that you have been able to identify and subset this data > in order to have sent your previous reply to the list, so lets move on. > > back to your confusion: your collaborator is right. the vast majority of > clustering is used to show RELATIVE expression, not absolute expression. > If you 'mean correct' your absolute expression data, you will convert it to > ratios, and then the heatmap.2 might give you a sensible picture. > > I agree with Sean (which I seem to be doing a lot recently) in that you > need to improve your basic R usage, and the links that Sean provided are a > great place to start, as is R for beginners by Paradis. > > cheers, Mark > > > >>>> >> Did you use heatmap.2 to do the heatmap? If so, there is an argument >> "scale" that might be useful. For ALL functions that are new, I would >> advise reading the whole help page, as there is often very useful >> information there. >> >> >>>>> But I am wondering whether I can perform using the R script? >>>>> >>>>> Can you elaborate on "using the R script"I was not sure about the R >>>> script >>>> for subsetting, so I performed using python. >>>> >>> >> You can try help.search('subset'), as a start. RSiteSearch is also >> useful for searching for answers. >> >> You will likely benefit from reading: >> >> http://cran.r-project.org/doc/manuals/R-intro.html >> >> And potentially from: >> >> >> http://biostat-09.berkeley.edu/~bullard/courses/T-berkeley-08/resou rces/R_intro_easy.pdf<http: biostat-09.berkeley.edu="" %7ebullard="" course="" s="" t-berkeley-08="" resources="" r_intro_easy.pdf=""> >> >> >>>>> I would appreciate any help. >>>>> >>>>> You need 2 things: the names of the DE genes, and the normalised data. >>>> Get the DE genes from your toptable, and the normalised data from within >>>> your MA object (hint: names(MA) ). >>>> Then sub-set the normalised data to just those rows from the DE genes, >>>> then >>>> perform cluster analysis. There are large number of ways of doing this. >>>> To >>>> get you started, have a look at heatmap.2 from the package gplots. >>>> others include the built in >>>> hclust( dist( yourDEdata ) ) >>>> >>>> cheers, >>>> Mark >>>> >>>> ----------------------------------------------------- >>>> Mark Cowley, BSc (Bioinformatics)(Hons) >>>> >>>> Peter Wills Bioinformatics Centre >>>> Garvan Institute of Medical Research, Sydney, Australia >>>> ----------------------------------------------------- >>>> >>>> >>>> >>> >>> -- >>> >>> Regards, >>> Abhilash >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> > -- Regards, Abhilash [[alternative HTML version deleted]]

ADD REPLY • link 15.6 years ago Abhilash Venu ▴ 340

0

Entering edit mode

Mark Cowley ▴ 910

@mark-cowley-2951

Last seen 9.6 years ago

Hi Hui-Yi, In the case of the 3 wt vs 3 mutants: you can do this a couple of ways. You can calculate the average level for each probeset in the wt, then subtract this value from the 3 mutant values, thereby getting a ratio of expression in each of the mt vs the average wt. Alternatively, you can correct each mutant value by a corresponding wt value, so the first mt - first wt, then 2nd mt - 2nd wt then 3rd mt - 3rd wt. It really depends on what you are after. some code: wtAvg <- apply(yourdata[,1:3], 1, mean) mtRatiosVsAvgWt <- yourdata[,4:6] - wtAvg # - or - mtRatiosVsWt <- yourdata[,4:6] - yourdata[,1:3] I'm not familiar with the genefilter/altcdfenvs, so can't help you there. If you know the names of the S cerevisiae probesets, then store them in a vector, and subset the rows of 'yourdata' to just the rows that match the cerevisiae probesetids. The alternative cdf approach will ensure that the pombe probesets will not be used in the normalisation of your cerevisiae probesets which is probably a good thing. cheers, Mark On 09/09/2008, at 4:52 AM, Hui-Yi Chu wrote: > Hi Mark and list, > > I really thank this series of discussions since I also have > questions relevant to clustering. > > In regarding to the "relative" expression of each DE genes, does > that mean it is relative to the average values among samples? > Assuming I have 3 wt and 3 mut samples with single color expression > data, after hclust function, I must get a red-green picture with 6 > (=3+3) columns. But if what I want is 3 columns result which > contains ratios based on mut/wt expression values? > > Additionally, I am analyzing affymetrix yeast2 arrays, so I wanna > skip the S. pombe expression value. Thus, my quesiton is can I > alternatively use the "filter" function within genefilter package > instead of creating alternative cdf by following altcdfenvs package? > > Any suggestions are really appreciated! > Hui-Yi > > > > > On Fri, Sep 5, 2008 at 12:07 AM, Mark Cowley > <m.cowley@garvan.org.au> wrote: > > On 05/09/2008, at 5:06 AM, Sean Davis wrote: > On Thu, Sep 4, 2008 at 10:59 AM, Abhilash Venu <abhivenu@gmail.com> > wrote: > On Thu, Sep 4, 2008 at 5:21 AM, Mark Cowley <m.cowley@garvan.org.au> > wrote: > > Hi Abhilash, > > On 02/09/2008, at 11:09 PM, Abhilash Venu wrote: > > Hi all, > > I am working on a single color expression data using limma. I would > like > to > perform a cluster analysis after selecting the differentially genes > based > on > the P value (say 0.001). As far as my knowledge is concerned I have > to do > the sub setting of these selected genes on the normalized data (MA), > to > retrieve the distribution across the samples. > > That's correct > > > > Thank you Mark, But I am quite cinfused here. Because our > colaborator has > already performed single color in agilent platform, when I had > performed > cluster using the same method as I mentioned the color key has given > positive values (as all the values are positive, if I chose values > from MA). > Our collaborator feels that this scenario is quite unusual because > the green > color usually represents down regulation. Could you suggest, how I > should go > about it? > Part of this confusion stems from your non-standard use of > 'MA' (I've checked your past posts to work this out), since 'MA' > implies two-colour data, where the M-values, which are the ratios > are the quantity of interest. You are dealing with single colour > data, so I assume that in your use of MA you need to be referring to > the A-values, but i'm not sure how limma deals with this in the way > that you have used it. My clear preference is when you are dealing > with single colour data is not to use 2-colour data objects. > However, I assume that you have been able to identify and subset > this data in order to have sent your previous reply to the list, so > lets move on. > > back to your confusion: your collaborator is right. the vast > majority of clustering is used to show RELATIVE expression, not > absolute expression. > If you 'mean correct' your absolute expression data, you will > convert it to ratios, and then the heatmap.2 might give you a > sensible picture. > > I agree with Sean (which I seem to be doing a lot recently) in that > you need to improve your basic R usage, and the links that Sean > provided are a great place to start, as is R for beginners by Paradis. > > cheers, Mark > > > > > Did you use heatmap.2 to do the heatmap? If so, there is an argument > "scale" that might be useful. For ALL functions that are new, I would > advise reading the whole help page, as there is often very useful > information there. > > > But I am wondering whether I can perform using the R script? > > Can you elaborate on "using the R script"I was not sure about the R > script > for subsetting, so I performed using python. > > You can try help.search('subset'), as a start. RSiteSearch is also > useful for searching for answers. > > You will likely benefit from reading: > > http://cran.r-project.org/doc/manuals/R-intro.html > > And potentially from: > > http://biostat-09.berkeley.edu/~bullard/courses/T-berkeley-08/resour ces/R_intro_easy.pdf > > > I would appreciate any help. > > You need 2 things: the names of the DE genes, and the normalised data. > Get the DE genes from your toptable, and the normalised data from > within > your MA object (hint: names(MA) ). > Then sub-set the normalised data to just those rows from the DE > genes, then > perform cluster analysis. There are large number of ways of doing > this. To > get you started, have a look at heatmap.2 from the package gplots. > others include the built in > hclust( dist( yourDEdata ) ) > > cheers, > Mark > > ----------------------------------------------------- > Mark Cowley, BSc (Bioinformatics)(Hons) > > Peter Wills Bioinformatics Centre > Garvan Institute of Medical Research, Sydney, Australia > ----------------------------------------------------- > > > > > -- > > Regards, > Abhilash > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 15.6 years ago Mark Cowley ▴ 910

Login before adding your answer.