pca plot for gene expression
2
0
Entering edit mode
Guest User ★ 12k
@guest-user-4897
Last seen 7.1 years ago
Hi all, I am writing this mail for second time. I wanted perform a pca analysis ,for each cancer type and genes of interest expression. I just wanted to plot only a single point which is able represent each cancer and their genes expression .Can you please explain me on it.( And cancer per gene basis should i take median or mean values to represent their expression). Thanks in advance. -- output of sessionInfo(): pca() -- Sent via the guest posting facility at bioconductor.org.
Cancer Cancer • 4.9k views
0
Entering edit mode
@mikelove
Last seen 45 minutes ago
United States
You might get more feedback if you describe what kind of experiment you have performed (microarray or RNA-Seq?). The other reason you might not be getting response is that the principal component functions are not implemented in Bioconductor, but in base R. So it's not necessarily a Bioconductor question, but a statistics/R question. The very basic code for making a PCA plot from an expression set 'e' would be pc = prcomp( t ( exprs( e ) ) ) plot( pc$x[ , 1:2 ] ) On Tue, Jul 15, 2014 at 8:36 AM, karthik [guest] <guest at="" bioconductor.org=""> wrote: > Hi all, > > I am writing this mail for second time. I wanted perform a pca analysis ,for each cancer type and genes of interest expression. I just wanted to plot only a single point which is able represent each cancer and their genes expression .Can you please explain me on it.( And cancer per gene basis should i take median or mean values to represent their expression). Thanks in advance. > > > -- output of sessionInfo(): > > pca() > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ADD COMMENT 0 Entering edit mode @mikelove Last seen 45 minutes ago United States hi Deepak, We like to always keep the discussion on the list, to avoid having to answer duplicate questions. Collapsing all the patients into a single point defeats the purpose of PCA: to see the distances between individual samples and groups of samples. Showing just the mean for each group might mislead someone looking at the plot into thinking the clusters are distinct, when the samples might have high variance around that average point. I would recommend instead just coloring the types of carcinoma. Mike On Wed, Jul 16, 2014 at 5:43 AM, deepak karthik <deepaksrna at="" gmail.com=""> wrote: > Thanks for your reply. > I have data of hundreds of patients from each carcinoma , consisting of > rnaseq expression with certain gene of interest. If i perform pca analysis > for numerous carcinoma , my pca plot would be clumsy difficult to find out > the type of carcinoma are clustered together . so i would like to mark > single point for a particular type of carcinoma with consideration of my > rnaseq expression for my gene of my interest . Thanks in advance . > > with regards, > S.Deepak > > > On Tue, Jul 15, 2014 at 9:31 PM, Michael Love <michaelisaiahlove at="" gmail.com=""> > wrote: >> >> You might get more feedback if you describe what kind of experiment >> you have performed (microarray or RNA-Seq?). >> >> The other reason you might not be getting response is that the >> principal component functions are not implemented in Bioconductor, but >> in base R. So it's not necessarily a Bioconductor question, but a >> statistics/R question. >> >> The very basic code for making a PCA plot from an expression set 'e' would >> be >> >> pc = prcomp( t ( exprs( e ) ) ) >> plot( pc$x[ , 1:2 ] ) >> >> On Tue, Jul 15, 2014 at 8:36 AM, karthik [guest] <guest at="" bioconductor.org=""> >> wrote: >> > Hi all, >> > >> > I am writing this mail for second time. I wanted perform a pca >> > analysis ,for each cancer type and genes of interest expression. I just >> > wanted to plot only a single point which is able represent each cancer and >> > their genes expression .Can you please explain me on it.( And cancer per >> > gene basis should i take median or mean values to represent their >> > expression). Thanks in advance. >> > >> > >> > -- output of sessionInfo(): >> > >> > pca() >> > >> > -- >> > Sent via the guest posting facility at bioconductor.org. >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > -- > > Deepak karthik > > PhD student > > +972-054-5683140 > > > Dr. Mali Salmon-Divon, Genomic Bioinformatics laboratory > > The Department of Molecular Biology > > Ariel University, Israel
0
Entering edit mode
Hi Michael Even if i perform coloring the carcinoma it is so crowded , i am not able to distinguish between cancer . That is the reason that i wanted to find a way to point out a single point for each cancer . My ultimate is to find the cancer which are related , with respective my gene of interest . Please suggest me a better approach. thank you, Deepak On Wed, Jul 16, 2014 at 3:50 PM, Michael Love <michaelisaiahlove@gmail.com> wrote: > hi Deepak, > > We like to always keep the discussion on the list, to avoid having to > answer duplicate questions. > > Collapsing all the patients into a single point defeats the purpose of > PCA: to see the distances between individual samples and groups of > samples. Showing just the mean for each group might mislead someone > looking at the plot into thinking the clusters are distinct, when the > samples might have high variance around that average point. I would > recommend instead just coloring the types of carcinoma. > > Mike > > On Wed, Jul 16, 2014 at 5:43 AM, deepak karthik <deepaksrna@gmail.com> > wrote: > > Thanks for your reply. > > I have data of hundreds of patients from each carcinoma , consisting of > > rnaseq expression with certain gene of interest. If i perform pca > analysis > > for numerous carcinoma , my pca plot would be clumsy difficult to find > out > > the type of carcinoma are clustered together . so i would like to mark > > single point for a particular type of carcinoma with consideration of > my > > rnaseq expression for my gene of my interest . Thanks in advance . > > > > with regards, > > S.Deepak > > > > > > On Tue, Jul 15, 2014 at 9:31 PM, Michael Love < > michaelisaiahlove@gmail.com> > > wrote: > >> > >> You might get more feedback if you describe what kind of experiment > >> you have performed (microarray or RNA-Seq?). > >> > >> The other reason you might not be getting response is that the > >> principal component functions are not implemented in Bioconductor, but > >> in base R. So it's not necessarily a Bioconductor question, but a > >> statistics/R question. > >> > >> The very basic code for making a PCA plot from an expression set 'e' > would > >> be > >> > >> pc = prcomp( t ( exprs( e ) ) ) > >> plot( pc$x[ , 1:2 ] ) > >> > >> On Tue, Jul 15, 2014 at 8:36 AM, karthik [guest] < > guest@bioconductor.org> > >> wrote: > >> > Hi all, > >> > > >> > I am writing this mail for second time. I wanted perform a pca > >> > analysis ,for each cancer type and genes of interest expression. I > just > >> > wanted to plot only a single point which is able represent each > cancer and > >> > their genes expression .Can you please explain me on it.( And cancer > per > >> > gene basis should i take median or mean values to represent their > >> > expression). Thanks in advance. > >> > > >> > > >> > -- output of sessionInfo(): > >> > > >> > pca() > >> > > >> > -- > >> > Sent via the guest posting facility at bioconductor.org. > >> > > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor@r-project.org > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: > >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > -- > > > > Deepak karthik > > > > PhD student > > > > +972-054-5683140 > > > > > > Dr. Mali Salmon-Divon, Genomic Bioinformatics laboratory > > > > The Department of Molecular Biology > > > > Ariel University, Israel > -- *Deepak karthik* *PhD student* *+972-054-5683140* *Dr. Mali Salmon-Divon, **Genomic Bioinformatics laboratory* *The Department of Molecular Biology* *Ariel University, Israel* [[alternative HTML version deleted]] ADD REPLY 0 Entering edit mode hi Deepak, This is a general R question as it doesn't involve software from Bioconductor, so you should post future questions like this to the R help mailing list https://stat.ethz.ch/mailman/listinfo/r-help or as an R tagged question on stackoverflow http://stackoverflow.com/questions/tagged/r. You can obtain the mean for each group many ways. One way is to use the ddply function in the plyr package on CRAN: http://cran.r-project.org/web/packages/plyr/plyr.pdf d = data.frame(PC1 = pc$x[,1], PC2 = pc$x[,2], f = factor(condition)) library(plyr) groupmeans = ddply(d, "f", summarise, mPC1=mean(PC1), mPC2=mean(PC2)) This gives the mean of PC1 and the mean of PC2 for each group. Mike On Tue, Jul 22, 2014 at 5:10 AM, deepak karthik <deepaksrna at="" gmail.com=""> wrote: > Hi Michael > > Even if i perform coloring the carcinoma it is so crowded , > i am not able to distinguish between cancer . That is the reason that i > wanted to find a way to point out a single point for each cancer . My > ultimate is to find the cancer which are related , with respective my gene > of interest . Please suggest me a better approach. > > thank you, > Deepak > > > > On Wed, Jul 16, 2014 at 3:50 PM, Michael Love <michaelisaiahlove at="" gmail.com=""> > wrote: >> >> hi Deepak, >> >> We like to always keep the discussion on the list, to avoid having to >> answer duplicate questions. >> >> Collapsing all the patients into a single point defeats the purpose of >> PCA: to see the distances between individual samples and groups of >> samples. Showing just the mean for each group might mislead someone >> looking at the plot into thinking the clusters are distinct, when the >> samples might have high variance around that average point. I would >> recommend instead just coloring the types of carcinoma. >> >> Mike >> >> On Wed, Jul 16, 2014 at 5:43 AM, deepak karthik <deepaksrna at="" gmail.com=""> >> wrote: >> > Thanks for your reply. >> > I have data of hundreds of patients from each carcinoma , consisting of >> > rnaseq expression with certain gene of interest. If i perform pca >> > analysis >> > for numerous carcinoma , my pca plot would be clumsy difficult to find >> > out >> > the type of carcinoma are clustered together . so i would like to mark >> > single point for a particular type of carcinoma with consideration of >> > my >> > rnaseq expression for my gene of my interest . Thanks in advance . >> > >> > with regards, >> > S.Deepak >> > >> > >> > On Tue, Jul 15, 2014 at 9:31 PM, Michael Love >> > <michaelisaiahlove at="" gmail.com=""> >> > wrote: >> >> >> >> You might get more feedback if you describe what kind of experiment >> >> you have performed (microarray or RNA-Seq?). >> >> >> >> The other reason you might not be getting response is that the >> >> principal component functions are not implemented in Bioconductor, but >> >> in base R. So it's not necessarily a Bioconductor question, but a >> >> statistics/R question. >> >> >> >> The very basic code for making a PCA plot from an expression set 'e' >> >> would >> >> be >> >> >> >> pc = prcomp( t ( exprs( e ) ) ) >> >> plot( pc$x[ , 1:2 ] ) >> >> >> >> On Tue, Jul 15, 2014 at 8:36 AM, karthik [guest] >> >> <guest at="" bioconductor.org=""> >> >> wrote: >> >> > Hi all, >> >> > >> >> > I am writing this mail for second time. I wanted perform a >> >> > pca >> >> > analysis ,for each cancer type and genes of interest expression. I >> >> > just >> >> > wanted to plot only a single point which is able represent each >> >> > cancer and >> >> > their genes expression .Can you please explain me on it.( And cancer >> >> > per >> >> > gene basis should i take median or mean values to represent their >> >> > expression). Thanks in advance. >> >> > >> >> > >> >> > -- output of sessionInfo(): >> >> > >> >> > pca() >> >> > >> >> > -- >> >> > Sent via the guest posting facility at bioconductor.org. >> >> > >> >> > _______________________________________________ >> >> > Bioconductor mailing list >> >> > Bioconductor at r-project.org >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> > Search the archives: >> >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > >> > >> > >> > -- >> > >> > Deepak karthik >> > >> > PhD student >> > >> > +972-054-5683140 >> > >> > >> > Dr. Mali Salmon-Divon, Genomic Bioinformatics laboratory >> > >> > The Department of Molecular Biology >> > >> > Ariel University, Israel > > > > > -- > > Deepak karthik > > PhD student > > +972-054-5683140 > > > Dr. Mali Salmon-Divon, Genomic Bioinformatics laboratory > > The Department of Molecular Biology > > Ariel University, Israel