R session aborted: prcomp
1
0
Entering edit mode
Gero • 0
@gero
Last seen 4 days ago

I am trying to do a principal component analysis to an Affymetrix microarray

SDRF <- read.delim('E-GEOD-40442.sdrf.txt')
rownames(SDRF) <- SDRF\$Source.Name
SDRF <- AnnotatedDataFrame(SDRF)
raw_data <- read.celfiles(files= cel.files, phenoData = SDRF)
exp_raw <- log2(Biobase::exprs(raw_data))
class(exp_raw)
--------------------------------------------------------------
"matrix" "array"

head(exp_raw)
---------------------------------------------------------------
GSM994317 1  GSM994316 1    GSM994315 1     GSM994314 1
1   12.632995   12.736825        12.331757           12.539401
2    6.781360    7.228819          6.658211             6.491853
3   12.498600   12.593158        12.262976           12.603626
4    6.965784    6.857981          6.066089              6.569856
5    5.000000    5.321928          5.426265              5.169925
6    6.781360    7.118941          6.741467              6.475733


Then while trying to perform a PCA is when the error occurs:

PCA_raw <- prcomp(t(exp_raw))
--------------------------------------------------------------
R Session Aborted
R enconuntured a fatal error.
The session was terminated

#(PD: This happens in Rstudio, the same occurs in the terminal)


here with any corresponding output

# please also include the results of running the following in an R session

sessionInfo( )



Biobase stats • 127 views
0
Entering edit mode

Hi, you are log [base 2] transforming raw expression levels and tying to perform a single value decomposition on these via prcomp()? what is the error returned outside of R Studio? what are the dimensions of the data? It may seem odd, but please re-start your computer in order to clear cache, and obviously start a new R session for every new analysis.

0
Entering edit mode

1) I am not sure about this question. My objective is to perform a principal component analysis so I can after that generate a representation of my samples in relation to their values for the first and second principal components.

2) The errors from R (Linux terminal) and Rstudio are exactly the same. The data was obtained from ArrayExpress (E-GEOD-40442) and is the following: https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-40442/

## 3) The dimensions are 67 samples for multiple genes:

dim(raw_data)

6553600      67


4) Reestarting the session and clearing the cache have not worked. I am not sure what is the error I am not taking into account but must be somewhere. I am still a learner

3
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States

You would need some pretty heavy iron to do SVD on a matrix of that size (I mean 6.5 million rows? Wowza). And those data are from an Affy Human Exon array, so those are most likely not genes, but instead PSR (probe set regions), which are parts of genes, not the genes themselves. If you use the oligo` package you can summarize at the gene level if you are trying to do gene level analyses.

An alternative would be to filter the probes to get rid of consistently unexpressed data, which would whittle your matrix down, possibly quite a bit. But personally I have never had much use for the probe level summarization of an Exon array, so if I were you I would summarize at the transcript (+/- gene) level and go from there. And I would filter that before doing PCA. You can get like most of the signal from the top 1000 or 5000 genes, rather than trying to do SVD on some massive matrix.

0
Entering edit mode

Wowza indeed!