Question

edgeR on DEG analysis: Size cannot be NA nor exceed 65536

0

Entering edit mode

cvergarapulgar • 0

@cvergarapulgar-12322

Last seen 7.2 years ago

Hello,

I'm using edgeR (in the Trinity-RNA Seq Pipeline) to do a differential expression analysis and when I try to make a heatmap I got the following error:

> gene_cor = NULL
> gene_dist = dist(data, method='euclidean')
> if (nrow(data) <= 1) { message('Too few genes to generate heatmap'); quit(status=0); }
> hc_genes = hclust(gene_dist, method='complete')
Error in hclust(gene_dist, method = "complete") :
size cannot be NA nor exceed 65536
Execution halted

In this case I found over 80k DEGs and I can't get make it work. However if I re-run all but with more strict parameter I get less DEGs and the error don't show up.

Anyone knows how to fix this error?

Thanks in advance!

edger differential gene expression • 3.1k views

ADD COMMENT • link updated 7.2 years ago by James W. MacDonald 65k • written 7.2 years ago by cvergarapulgar • 0

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 6 hours ago

The city by the bay

For starters, this isn't an edgeR question. It's hclust (from the stats package) that's throwing the error.

As for the error, it pretty much explains itself. You have too many genes and hclust doesn't want to deal with the resulting distance matrix. Maybe Rclusterpp might be able to handle it without using up too much memory.

I should also point out that 80K is a lot of DEGs. One can only pity the poor soul who has to make sense of it. There's a reason why most people just do DE analyses on gene counts if a reference genome is available.

ADD COMMENT • link 7.2 years ago Aaron Lun ★ 28k

1

Entering edit mode

What do people do if there isn't one?

ADD REPLY • link 7.2 years ago James W. MacDonald 65k

0

Entering edit mode

My point was not to do de novo transcriptome assemblies for the sake of it. Don't know the OP's situation, though.

ADD REPLY • link 7.2 years ago Aaron Lun ★ 28k

score 2 · Accepted Answer · 2017-02-09

2

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 21 hours ago

United States

This is only tangentially related to Bioconductor, and certainly has nothing to do with Bioc packages, so you should probably be asking about this on R-help (r-help@r-project.org).

That said, if you are aligning against a de novo transcriptome, you should note that Trinity turns everything into a transcript, so there are likely to be lots of 'transcripts' that are probably not representative of anything real, that you need to filter out prior to doing any comparisons. There are lots of ways to do this; one way that has been recommended on this site before (by Ryan Thompson, IIRC) that I sort of like is to plot the distribution of the rowSums of the logCPM values. This is usually a bimodal distribution, and it usually seems pretty safe to select some cutoff that excludes the hump on the left and keeps the hump on the right.

That should get you down to a tractable number of genes to compare.

ADD COMMENT • link 7.2 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks for the quick response! I will try that!

ADD REPLY • link 7.2 years ago cvergarapulgar • 0

1

Entering edit mode

James has suggested that you filter out low expressed transcripts. That is very sensible but I doubt it will be sufficient in itself and you should have already done this sort of filtering anyway before the DE analysis. I would suggest you use edgeR's glmTreat() function to choose a smaller number of transcripts with larger fold changes changes. You can't see tens of thousands of genes on a heatmap anyway. It's far more sensible to plot a carefully chosen subset of transcripts. Otherwise what are you doing the heatmap for?

ADD REPLY • link 7.2 years ago Gordon Smyth 50k