Search
Question: edgeR long vectors support when normalizing abundance matrix
0
11 months ago by
jtremblay5140 wrote:

Dear BioConductor community,

I've recently been trying to normalizing large metagenome abundance matrices with the function cpm(y) of edgeR. I get the following error:

Error in .isAllZero(counts) :   long vectors not supported yet: memory.c:3438 Calls: mergeTables -> DGEList -> .isAllZero -> .Call

This is obviously a memory issue and was just wondering if long vectors will be supported in the near future. I usually have no problems with this step, except for this time where this is the largest dataset I've been processing so far.

Thanks!

-J

modified 11 months ago • written 11 months ago by jtremblay5140

my edgeR version is: edgeR_3.18.1 with R 3.4.0

Okay it works fine with the devel package. Many thanks!

Well it seems I've got another error downstream - here is part of my code:

  y <- DGEList(df, remove.zeros=TRUE)   y <- calcNormFactors(y, method="TMM") # Altough not sure if necessary...   cpms = cpm(y)   cpms = round(cpms, digits=3)   write.table(cpms, outfileCpm, quote=FALSE, sep="\t", row.names=TRUE, col.names=NA)

and in output:

Removing 1241 rows with all zero counts Error in write.table(cpms, outfileCpm, quote = FALSE, sep = "\t", row.names = TRUE,  :      corrupt matrix -- dims not not match length Calls: mergeTables -> write.table Execution halted

I have 9,582,472 genes in in the df object, if that's relevant.

Cheers,

I can't reproduce your error. I tried the same code in edgeR 3.19.7 with a 9582472 x 10 count matrix, and all ran fine.

I have a 9582472 x 381 matrix. I can share if needed. Thx!

This seems like a problem with write.table than with any edgeR functions. And little wonder - a 9582472 x 381 double-precision matrix occupies 29 GB in memory! Are you sure you want to write this to file?

It is huge I concur and I have to admit that I might have to revisit my SOPs as projects (and datasets) are becoming bigger and bigger. For downstream analyses, only subsets of this final normalized matrix will be pulled (i.e. certain genes with selected functions) at a time, so it shouldn't be a problem at that time. I just need to normalize everything together before going forward. I'll try to write the table with fwrite (data.table) - any other suggestions will be welcomed.

It worked with fwrite (data.table).

3
11 months ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

Most of these issues have been fixed in the BioC-devel version of edgeR. You can either switch to using BioC-devel now (via useDevel() in BiocInstaller) or wait until the next release.