Error in PCA analysis and with ggbiplot function
1
0
Entering edit mode
svlachavas ▴ 830
@svlachavas-7225
Last seen 6 months ago
Germany/Heidelberg/German Cancer Resear…

Dear Bioconductor community,

after normalizing & filtering an microarray expression dataset, with 11658 rows(probesets) and 26 columns(samples) remained, i tried to do a PCA analysis, Firtsly i transposed the data: data <- t(exprs(eset))

But when i used the function prcomp(data, scale=TRUE) i get the following error:

"Error in prcomp.default(data, scale = TRUE) : cannot rescale a constant/zero column to unit variance"

Afterwards when i didnt use the scale argument the pca function worked, but when i used the ggbiplot function, i also get another error:

"Error: invalid rot value"

Is this mistake also related to the argument scale? how can i fixed it ? I also searched some other similar threads for removing columns with zero variance, but it didnt worked. Any ideas or suggestions ??

bioconductor pca ggbiplot expresiondataset • 8.2k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 14 minutes ago
United States

Please note that neither prcomp() nor ggbiplot() are part of Bioconductor, so this is really not the forum for your question.

That said, is there something about the error message that isn't clear? It says that you cannot rescale a column with zero variance to unit variance (in other words, to convert to unit variance, you divide the values in the column by the standard deviation of the column. If the column has variance zero, the standard deviation is zero, and we all know what happens if you divide by zero). Since you are transposing, this means you have genes (rows) that don't change expression.

It's simple to fix this:

library(genefilter)

ind <- rowVars(exprs(eset)) < .Machine$double.eps

eset <- eset[!ind,]

z <- prcomp(t(exprs(eset)), scale = TRUE)
ADD COMMENT
0
Entering edit mode

Firstly, thank you kindly for answering my question. I didnt intend to post in the support forum inappropriate questions, but i didnt also knew any other forum similar to the background of the question. Firstly, i thought in a wrong way that because of transposing, the first error was correlated with samples being the rows. Moreover, your observation about my dataset is true and makes sense because i didnt filter based on variance, but on present/absent calls & generally, i think it is a general assumption that a important proportion of genes dont change expression in a typical microarray experiment.

ADD REPLY
0
Entering edit mode

While it is a general assumption that a large proportion of genes don't change expression, this doesn't mean that the values you get from a microarray should be identical for all samples. In other words, there will always be some error in our measurements, so the expectation is that the expression values for a gene that is probably not changing expression will be very similar across samples, but not identical.

The fact that you have identical values is probably because you have a small number of replicates (and if these are Affy arrays, an odd number of samples, like 3 or 5 or 7, which due to the medianpolish algorithm can give rise to identical values for all samples).

Anyway, a gene with no variability is de facto uninteresting, and removing those genes is a good idea.

ADD REPLY
0
Entering edit mode

Yes i have affymetrix platform hgu133a and i have biological replicates, that is 13 patients, each with 2 samples: one control & one cancer sample. Regarding the other important aspect you have developed, because i implemented limma afterwards, i hesitated using a genefilter based on variance prior of limma, motivated by the paper from Bourgon et al., 2010.

ADD REPLY

Login before adding your answer.

Traffic: 926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6