Search
Question: edgeR on DEG analysis: Size cannot be NA nor exceed 65536
0
gravatar for cvergarapulgar
7 months ago by
cvergarapulgar0 wrote:

Hello,

I'm using edgeR (in the Trinity-RNA Seq Pipeline) to do a differential expression analysis and when I try to make a heatmap I got the following error:

> gene_cor = NULL
> gene_dist = dist(data, method='euclidean')
> if (nrow(data) <= 1) { message('Too few genes to generate heatmap'); quit(status=0); }
> hc_genes = hclust(gene_dist, method='complete')
Error in hclust(gene_dist, method = "complete") :
  size cannot be NA nor exceed 65536
Execution halted

In this case I found over 80k DEGs and I can't get make it work. However if I re-run all but with more strict parameter I get less DEGs and the error don't show up.

Anyone knows how to fix this error?

Thanks in advance!

ADD COMMENTlink modified 7 months ago by James W. MacDonald44k • written 7 months ago by cvergarapulgar0
2
gravatar for James W. MacDonald
7 months ago by
United States
James W. MacDonald44k wrote:

This is only tangentially related to Bioconductor, and certainly has nothing to do with Bioc packages, so you should probably be asking about this on R-help (r-help@r-project.org).

That said, if you are aligning against a de novo transcriptome, you should note that Trinity turns everything into a transcript, so there are likely to be lots of 'transcripts' that are probably not representative of anything real, that you need to filter out prior to doing any comparisons. There are lots of ways to do this; one way that has been recommended on this site before (by Ryan Thompson, IIRC) that I sort of like is to plot the distribution of the rowSums of the logCPM values. This is usually a bimodal distribution, and it usually seems pretty safe to select some cutoff that excludes the hump on the left and keeps the hump on the right.

That should get you down to a tractable number of genes to compare.

ADD COMMENTlink written 7 months ago by James W. MacDonald44k

Thanks for the quick response! I will try that!

ADD REPLYlink written 7 months ago by cvergarapulgar0
1

James has suggested that you filter out low expressed transcripts. That is very sensible but I doubt it will be sufficient in itself and you should have already done this sort of filtering anyway before the DE analysis. I would suggest you use edgeR's glmTreat() function to choose a smaller number of transcripts with larger fold changes changes. You can't see tens of thousands of genes on a heatmap anyway. It's far more sensible to plot a carefully chosen subset of transcripts. Otherwise what are you doing the heatmap for?

ADD REPLYlink modified 7 months ago • written 7 months ago by Gordon Smyth31k
0
gravatar for Aaron Lun
7 months ago by
Aaron Lun16k
Cambridge, United Kingdom
Aaron Lun16k wrote:

For starters, this isn't an edgeR question. It's hclust (from the stats package) that's throwing the error.

As for the error, it pretty much explains itself. You have too many genes and hclust doesn't want to deal with the resulting distance matrix. Maybe Rclusterpp might be able to handle it without using up too much memory.

I should also point out that 80K is a lot of DEGs. One can only pity the poor soul who has to make sense of it. There's a reason why most people just do DE analyses on gene counts if a reference genome is available.

ADD COMMENTlink modified 7 months ago • written 7 months ago by Aaron Lun16k
1

What do people do if there isn't one?

ADD REPLYlink written 7 months ago by James W. MacDonald44k

My point was not to do de novo transcriptome assemblies for the sake of it. Don't know the OP's situation, though.

ADD REPLYlink written 7 months ago by Aaron Lun16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 268 users visited in the last hour