pca plot for gene expression
Hi all, I am writing this mail for second time. I wanted perform a pca analysis ,for each cancer type and genes of interest expression. I just wanted to plot only a single point which is able represent each cancer and their genes expression .Can you please explain me on it.( And cancer per gene basis should i take median or mean values to represent their expression). Thanks in advance. -- output of sessionInfo(): pca() -- Sent via the guest posting facility at bioconductor.org.
You might get more feedback if you describe what kind of experiment you have performed (microarray or RNA-Seq?). The other reason you might not be getting response is that the principal component functions are not implemented in Bioconductor, but in base R. So it's not necessarily a Bioconductor question, but a statistics/R question. The very basic code for making a PCA plot from an expression set 'e' would be pc = prcomp( t ( exprs( e ) ) ) plot( pc$x[ , 1:2 ] )
Hi Michael Even if i perform coloring the carcinoma it is so crowded , i am not able to distinguish between cancer . That is the reason that i wanted to find a way to point out a single point for each cancer . My ultimate is to find the cancer which are related , with respective my gene of interest . Please suggest me a better approach. thank you, Deepak

hi Deepak, We like to always keep the discussion on the list, to avoid having to answer duplicate questions. Collapsing all the patients into a single point defeats the purpose of PCA: to see the distances between individual samples and groups of samples. Showing just the mean for each group might mislead someone looking at the plot into thinking the clusters are distinct, when the samples might have high variance around that average point. I would recommend instead just coloring the types of carcinoma.

Thanks for your reply. I have data of hundreds of patients from each carcinoma , consisting of rnaseq expression with certain gene of interest. If i perform pca analysis for numerous carcinoma , my pca plot would be clumsy difficult to find out the type of carcinoma are clustered together . so i would like to mark single point for a particular type of carcinoma with consideration of my rnaseq expression for my gene of my interest . You can obtain the mean for each group many ways. One way is to use the ddply function in the plyr package on CRAN: http://cran.r-project.org/web/packages/plyr/plyr.pdf d = data.frame(PC1 = pc$x[,1], PC2 = pc$x[,2], f = factor(condition)) library(plyr) groupmeans = ddply(d, "f", summarise, mPC1=mean(PC1), mPC2=mean(PC2)) This gives the mean of PC1 and the mean of PC2 for each group. 