Question: How to convert large gene counts data into matrix?
0
4 months ago by
sambunga0940 wrote:

Hi, I have a gene counts table which something looks like this:

When i'm trying to convert this into a matrix im getting this following error: Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105

This is the code that i'm using which i borrowed from a tutorial online:

library(Matrix) library(data.table) f<-as.data.frame(fread(file="Z:/sc_wizard/data.counts",h=T,sep='\t')) sm = sparseMatrix(as.numeric(factor(f$gene)), as.numeric(factor(f$cell)), x=f$count) rownames(sm)<-levels(factor(f$gene)) colnames(sm)<-levels(factor(f$cell)) write.table(as.matrix(sm),file = "out.tsv",sep='\t',col.names=T,row.names=T,quote=F) Help is really appreciated. Thank you!! ADD COMMENTlink modified 4 months ago by Aaron Lun25k • written 4 months ago by sambunga0940 Answer: How to convert large gene counts data into matrix? 2 4 months ago by Aaron Lun25k Cambridge, United Kingdom Aaron Lun25k wrote: This isn't a question about any Bioconductor packages. In fact, it's not even a question involving any single-cell analysis packages, based on the code snippets you have above. That said, I will give you a possible answer. I suspect your count matrix has more than .Machine$integer.max non-zero entries, which means that the dgCMatrix cannot represent it. This stems from the fact that the compressed sparse matrix format needs to keep a cumulative sum of the non-zero entries in each column as it iterates across the matrix; and this sum is stored in a signed integer vector; and the maximum signed integer is as stated by .Machine\$integer.max. Adding past that will result in integer overflow.

If you want to represent it as a sparse matrix, one possible solution is to turn off giveCsparse, which will use the less-efficient dgTMatrix format. This should avoid the overflow problem but will reduce efficiency for downstream analyses, in terms of both memory and speed.

BTW, if the matrix is as large as you say it is, then calling as.matrix() is crazy. You should think about borrowed code VERY CAREFULLY before executing it, especially if it's from a stranger on the internet.

unlink(list.files("~") recursive=TRUE, force=TRUE) # enjoy!

# Yes, the missing comma is deliberate, just in case someone
# still tries to copy and paste it, despite my warnings.


Thank you so much for the reply. I will try turning off giveCsparse.

I just tried turning off giveCsparse, I got an error while trying to write it to a file: Cannot allocate vector of size 69 GB And how can i use the code which you have provided in the end in your answer?

Thank you

First things first.

And how can i use the code which you have provided in the end in your answer?

Stop stop stop. That's exactly the kind of attitude I was trying to warn you about. The (deliberately incorrect) code snippet in my answer is a dangerous command that - if corrected and executed - will delete all of your personal directories and their contents on a Unix system. That would be Bad.

The moral of the story is that, if you're getting a piece of code from a random source and you don't know what it does, you shouldn't run it until you or someone you trust understands it. Take it from me, this is a painful and expensive lesson that only needs to be learnt once.

I got an error while trying to write it to a file: Cannot allocate vector of size 69 GB

You did see my comment about how calling as.matrix() was crazy, right?