edgeR on DEG analysis: Size cannot be NA nor exceed 65536
2
0
Entering edit mode
@cvergarapulgar-12322
Last seen 7.8 years ago

Hello,

I'm using edgeR (in the Trinity-RNA Seq Pipeline) to do a differential expression analysis and when I try to make a heatmap I got the following error:

> gene_cor = NULL
> gene_dist = dist(data, method='euclidean')
> if (nrow(data) <= 1) { message('Too few genes to generate heatmap'); quit(status=0); }
> hc_genes = hclust(gene_dist, method='complete')
Error in hclust(gene_dist, method = "complete") :
  size cannot be NA nor exceed 65536
Execution halted

In this case I found over 80k DEGs and I can't get make it work. However if I re-run all but with more strict parameter I get less DEGs and the error don't show up.

Anyone knows how to fix this error?

Thanks in advance!

edger differential gene expression • 3.4k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States

This is only tangentially related to Bioconductor, and certainly has nothing to do with Bioc packages, so you should probably be asking about this on R-help (r-help@r-project.org).

That said, if you are aligning against a de novo transcriptome, you should note that Trinity turns everything into a transcript, so there are likely to be lots of 'transcripts' that are probably not representative of anything real, that you need to filter out prior to doing any comparisons. There are lots of ways to do this; one way that has been recommended on this site before (by Ryan Thompson, IIRC) that I sort of like is to plot the distribution of the rowSums of the logCPM values. This is usually a bimodal distribution, and it usually seems pretty safe to select some cutoff that excludes the hump on the left and keeps the hump on the right.

That should get you down to a tractable number of genes to compare.

ADD COMMENT
0
Entering edit mode

Thanks for the quick response! I will try that!

ADD REPLY
1
Entering edit mode

James has suggested that you filter out low expressed transcripts. That is very sensible but I doubt it will be sufficient in itself and you should have already done this sort of filtering anyway before the DE analysis. I would suggest you use edgeR's glmTreat() function to choose a smaller number of transcripts with larger fold changes changes. You can't see tens of thousands of genes on a heatmap anyway. It's far more sensible to plot a carefully chosen subset of transcripts. Otherwise what are you doing the heatmap for?

ADD REPLY
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 17 hours ago
The city by the bay

For starters, this isn't an edgeR question. It's hclust (from the stats package) that's throwing the error.

As for the error, it pretty much explains itself. You have too many genes and hclust doesn't want to deal with the resulting distance matrix. Maybe Rclusterpp might be able to handle it without using up too much memory.

I should also point out that 80K is a lot of DEGs. One can only pity the poor soul who has to make sense of it. There's a reason why most people just do DE analyses on gene counts if a reference genome is available.

ADD COMMENT
1
Entering edit mode

What do people do if there isn't one?

ADD REPLY
0
Entering edit mode

My point was not to do de novo transcriptome assemblies for the sake of it. Don't know the OP's situation, though.

ADD REPLY

Login before adding your answer.

Traffic: 650 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6