Search
Question: edgeR long vectors support when normalizing abundance matrix
0
gravatar for jtremblay514
14 months ago by
Canada
jtremblay5140 wrote:

Dear BioConductor community,

I've recently been trying to normalizing large metagenome abundance matrices with the function cpm(y) of edgeR. I get the following error:

Error in .isAllZero(counts) :
  long vectors not supported yet: memory.c:3438
Calls: mergeTables -> DGEList -> .isAllZero -> .Call

This is obviously a memory issue and was just wondering if long vectors will be supported in the near future. I usually have no problems with this step, except for this time where this is the largest dataset I've been processing so far.

Thanks!

-J

 

ADD COMMENTlink modified 14 months ago • written 14 months ago by jtremblay5140

my edgeR version is: edgeR_3.18.1 with R 3.4.0

ADD REPLYlink written 14 months ago by jtremblay5140

Okay it works fine with the devel package. Many thanks!

ADD REPLYlink modified 14 months ago by Gordon Smyth35k • written 14 months ago by jtremblay5140

Well it seems I've got another error downstream - here is part of my code:

  y <- DGEList(df, remove.zeros=TRUE)
  y <- calcNormFactors(y, method="TMM") # Altough not sure if necessary...
  cpms = cpm(y)
  cpms = round(cpms, digits=3)
  write.table(cpms, outfileCpm, quote=FALSE, sep="\t", row.names=TRUE, col.names=NA)

and in output:

Removing 1241 rows with all zero counts
Error in write.table(cpms, outfileCpm, quote = FALSE, sep = "\t", row.names = TRUE,  :   
  corrupt matrix -- dims not not match length
Calls: mergeTables -> write.table
Execution halted

I have 9,582,472 genes in in the df object, if that's relevant.

Cheers,

 

ADD REPLYlink written 14 months ago by jtremblay5140

I can't reproduce your error. I tried the same code in edgeR 3.19.7 with a 9582472 x 10 count matrix, and all ran fine.

ADD REPLYlink written 14 months ago by Gordon Smyth35k

I have a 9582472 x 381 matrix. I can share if needed. Thx!
 

ADD REPLYlink written 14 months ago by jtremblay5140

This seems like a problem with write.table than with any edgeR functions. And little wonder - a 9582472 x 381 double-precision matrix occupies 29 GB in memory! Are you sure you want to write this to file?

ADD REPLYlink written 14 months ago by Aaron Lun21k

It is huge I concur and I have to admit that I might have to revisit my SOPs as projects (and datasets) are becoming bigger and bigger. For downstream analyses, only subsets of this final normalized matrix will be pulled (i.e. certain genes with selected functions) at a time, so it shouldn't be a problem at that time. I just need to normalize everything together before going forward. I'll try to write the table with fwrite (data.table) - any other suggestions will be welcomed.

ADD REPLYlink modified 14 months ago • written 14 months ago by jtremblay5140

It worked with fwrite (data.table).

ADD REPLYlink written 14 months ago by jtremblay5140
3
gravatar for Aaron Lun
14 months ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

Most of these issues have been fixed in the BioC-devel version of edgeR. You can either switch to using BioC-devel now (via useDevel() in BiocInstaller) or wait until the next release.

ADD COMMENTlink written 14 months ago by Aaron Lun21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 245 users visited in the last hour