Search
Question: edgeR long vectors support when normalizing abundance matrix
0
gravatar for jtremblay514
6 weeks ago by
Canada
jtremblay5140 wrote:

Dear BioConductor community,

I've recently been trying to normalizing large metagenome abundance matrices with the function cpm(y) of edgeR. I get the following error:

Error in .isAllZero(counts) :
  long vectors not supported yet: memory.c:3438
Calls: mergeTables -> DGEList -> .isAllZero -> .Call

This is obviously a memory issue and was just wondering if long vectors will be supported in the near future. I usually have no problems with this step, except for this time where this is the largest dataset I've been processing so far.

Thanks!

-J

 

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by jtremblay5140

my edgeR version is: edgeR_3.18.1 with R 3.4.0

ADD REPLYlink written 6 weeks ago by jtremblay5140

Okay it works fine with the devel package. Many thanks!

ADD REPLYlink modified 6 weeks ago by Gordon Smyth32k • written 6 weeks ago by jtremblay5140

Well it seems I've got another error downstream - here is part of my code:

  y <- DGEList(df, remove.zeros=TRUE)
  y <- calcNormFactors(y, method="TMM") # Altough not sure if necessary...
  cpms = cpm(y)
  cpms = round(cpms, digits=3)
  write.table(cpms, outfileCpm, quote=FALSE, sep="\t", row.names=TRUE, col.names=NA)

and in output:

Removing 1241 rows with all zero counts
Error in write.table(cpms, outfileCpm, quote = FALSE, sep = "\t", row.names = TRUE,  :   
  corrupt matrix -- dims not not match length
Calls: mergeTables -> write.table
Execution halted

I have 9,582,472 genes in in the df object, if that's relevant.

Cheers,

 

ADD REPLYlink written 6 weeks ago by jtremblay5140

I can't reproduce your error. I tried the same code in edgeR 3.19.7 with a 9582472 x 10 count matrix, and all ran fine.

ADD REPLYlink written 6 weeks ago by Gordon Smyth32k

I have a 9582472 x 381 matrix. I can share if needed. Thx!
 

ADD REPLYlink written 6 weeks ago by jtremblay5140

This seems like a problem with write.table than with any edgeR functions. And little wonder - a 9582472 x 381 double-precision matrix occupies 29 GB in memory! Are you sure you want to write this to file?

ADD REPLYlink written 6 weeks ago by Aaron Lun17k

It is huge I concur and I have to admit that I might have to revisit my SOPs as projects (and datasets) are becoming bigger and bigger. For downstream analyses, only subsets of this final normalized matrix will be pulled (i.e. certain genes with selected functions) at a time, so it shouldn't be a problem at that time. I just need to normalize everything together before going forward. I'll try to write the table with fwrite (data.table) - any other suggestions will be welcomed.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by jtremblay5140

It worked with fwrite (data.table).

ADD REPLYlink written 6 weeks ago by jtremblay5140
3
gravatar for Aaron Lun
6 weeks ago by
Aaron Lun17k
Cambridge, United Kingdom
Aaron Lun17k wrote:

Most of these issues have been fixed in the BioC-devel version of edgeR. You can either switch to using BioC-devel now (via useDevel() in BiocInstaller) or wait until the next release.

ADD COMMENTlink written 6 weeks ago by Aaron Lun17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 107 users visited in the last hour