Question: How to convert large gene counts data into matrix?
0
gravatar for sambunga094
24 days ago by
sambunga0940 wrote:

Hi, I have a gene counts table which something looks like this:

enter image description here

When i'm trying to convert this into a matrix im getting this following error: Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105

This is the code that i'm using which i borrowed from a tutorial online:

library(Matrix) library(data.table) f<-as.data.frame(fread(file="Z:/sc_wizard/data.counts",h=T,sep='\t')) sm = sparseMatrix(as.numeric(factor(f$gene)), as.numeric(factor(f$cell)), x=f$count) rownames(sm)<-levels(factor(f$gene)) colnames(sm)<-levels(factor(f$cell)) write.table(as.matrix(sm),file = "out.tsv",sep='\t',col.names=T,row.names=T,quote=F)

Help is really appreciated. Thank you!!

ADD COMMENTlink modified 24 days ago by Aaron Lun24k • written 24 days ago by sambunga0940
Answer: How to convert large gene counts data into matrix?
2
gravatar for Aaron Lun
24 days ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

This isn't a question about any Bioconductor packages. In fact, it's not even a question involving any single-cell analysis packages, based on the code snippets you have above.

That said, I will give you a possible answer. I suspect your count matrix has more than .Machine$integer.max non-zero entries, which means that the dgCMatrix cannot represent it. This stems from the fact that the compressed sparse matrix format needs to keep a cumulative sum of the non-zero entries in each column as it iterates across the matrix; and this sum is stored in a signed integer vector; and the maximum signed integer is as stated by .Machine$integer.max. Adding past that will result in integer overflow.

If you want to represent it as a sparse matrix, one possible solution is to turn off giveCsparse, which will use the less-efficient dgTMatrix format. This should avoid the overflow problem but will reduce efficiency for downstream analyses, in terms of both memory and speed.

BTW, if the matrix is as large as you say it is, then calling as.matrix() is crazy. You should think about borrowed code VERY CAREFULLY before executing it, especially if it's from a stranger on the internet.

unlink(list.files("~") recursive=TRUE, force=TRUE) # enjoy!

# Yes, the missing comma is deliberate, just in case someone 
# still tries to copy and paste it, despite my warnings.
ADD COMMENTlink modified 24 days ago • written 24 days ago by Aaron Lun24k

Thank you so much for the reply. I will try turning off giveCsparse.

ADD REPLYlink written 24 days ago by sambunga0940

I just tried turning off giveCsparse, I got an error while trying to write it to a file: Cannot allocate vector of size 69 GB And how can i use the code which you have provided in the end in your answer?

Thank you

ADD REPLYlink written 24 days ago by sambunga0940

First things first.

And how can i use the code which you have provided in the end in your answer?

Stop stop stop. That's exactly the kind of attitude I was trying to warn you about. The (deliberately incorrect) code snippet in my answer is a dangerous command that - if corrected and executed - will delete all of your personal directories and their contents on a Unix system. That would be Bad.

The moral of the story is that, if you're getting a piece of code from a random source and you don't know what it does, you shouldn't run it until you or someone you trust understands it. Take it from me, this is a painful and expensive lesson that only needs to be learnt once.

I got an error while trying to write it to a file: Cannot allocate vector of size 69 GB

You did see my comment about how calling as.matrix() was crazy, right?

ADD REPLYlink modified 23 days ago • written 23 days ago by Aaron Lun24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 342 users visited in the last hour