edgeR long vectors support when normalizing abundance matrix
1
0
Entering edit mode
@jtremblay514-8928
Last seen 6 months ago
Canada

Dear BioConductor community,

I've recently been trying to normalizing large metagenome abundance matrices with the function cpm(y) of edgeR. I get the following error:

Error in .isAllZero(counts) :
  long vectors not supported yet: memory.c:3438
Calls: mergeTables -> DGEList -> .isAllZero -> .Call

This is obviously a memory issue and was just wondering if long vectors will be supported in the near future. I usually have no problems with this step, except for this time where this is the largest dataset I've been processing so far.

Thanks!

-J

 

edger cpm • 1.7k views
ADD COMMENT
0
Entering edit mode

my edgeR version is: edgeR_3.18.1 with R 3.4.0

ADD REPLY
0
Entering edit mode

Okay it works fine with the devel package. Many thanks!

ADD REPLY
0
Entering edit mode

Well it seems I've got another error downstream - here is part of my code:

  y <- DGEList(df, remove.zeros=TRUE)
  y <- calcNormFactors(y, method="TMM") # Altough not sure if necessary...
  cpms = cpm(y)
  cpms = round(cpms, digits=3)
  write.table(cpms, outfileCpm, quote=FALSE, sep="\t", row.names=TRUE, col.names=NA)

and in output:

Removing 1241 rows with all zero counts
Error in write.table(cpms, outfileCpm, quote = FALSE, sep = "\t", row.names = TRUE,  :   
  corrupt matrix -- dims not not match length
Calls: mergeTables -> write.table
Execution halted

I have 9,582,472 genes in in the df object, if that's relevant.

Cheers,

 

ADD REPLY
0
Entering edit mode

I can't reproduce your error. I tried the same code in edgeR 3.19.7 with a 9582472 x 10 count matrix, and all ran fine.

ADD REPLY
0
Entering edit mode

I have a 9582472 x 381 matrix. I can share if needed. Thx!
 

ADD REPLY
0
Entering edit mode

This seems like a problem with write.table than with any edgeR functions. And little wonder - a 9582472 x 381 double-precision matrix occupies 29 GB in memory! Are you sure you want to write this to file?

ADD REPLY
0
Entering edit mode

It is huge I concur and I have to admit that I might have to revisit my SOPs as projects (and datasets) are becoming bigger and bigger. For downstream analyses, only subsets of this final normalized matrix will be pulled (i.e. certain genes with selected functions) at a time, so it shouldn't be a problem at that time. I just need to normalize everything together before going forward. I'll try to write the table with fwrite (data.table) - any other suggestions will be welcomed.

ADD REPLY
0
Entering edit mode

It worked with fwrite (data.table).

ADD REPLY
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 15 minutes ago
The city by the bay

Most of these issues have been fixed in the BioC-devel version of edgeR. You can either switch to using BioC-devel now (via useDevel() in BiocInstaller) or wait until the next release.

ADD COMMENT

Login before adding your answer.

Traffic: 515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6