DESeq2 unable to handle large count values
1
0
Entering edit mode
robennet • 0
@166fd24c
Last seen 5 months ago
United States

My counts matrix has 100+ transcripts with at least one value over the maximum R can convert to an integer. Some of these transcripts don't have any counts small enough for R to handle as an integer. We've tried to think of ways to shift the values down, but given so many have smaller counts for certain participants, it's not really feasible. We've considered truncating all massive values to the max integer value, but that eliminates any potential nuance in those transcripts. We even talked about filtering out transcripts with counts that high, but, again, we're concerned about losing valuable information.

I've tried searching for a work-around that lets me pass integer64 values instead, but haven't found anything. Is there something I'm missing or can DESeq2 just not handle counts this large?

DESeq2 • 683 views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 15 hours ago
United States

There is an issue on the GitHub to deal with this, but for the moment, we can't accommodate it. Out of curiosity, how did you generate a count for one gene and one sample of 1 billion?

ADD COMMENT
0
Entering edit mode

Michael Love I didn't do any of the data collection or anything, I'm just developing the pipeline for expression analysis. I believe my PI is trying to look into that, just to make sure everything was done correctly, but I'm not sure when or if I'll ever find out.

Do you have any suggestions on alternative ways to handle large values? The only thing I can think of is some type of normalization from 1-2147483647, but I know DESeq2 specifically relies on raw counts, because it handles all the normalization on the backend.

ADD REPLY
1
Entering edit mode

Maybe use a log linear approach like limma.

I do not know how one gene could have a count of a billion. Rather than trying to push this data through I would take a step back and see if you have any problems in the processing.

ADD REPLY
0
Entering edit mode

Oh, it's around 140 genes out of the roughly 16,000 that have counts over 2.1 billion. The number of massive values range from 1 to 285, which is how many individuals we have sequencing data for. Everything else seems fine, it's just that this subgroup of genes that were massively expressed. I'll bring up limma in my next meeting and see how it goes. Thanks for the help!

ADD REPLY
0
Entering edit mode

I recommend to solve the underlying problem. RNA-seq does not produce these values, so something must be wrong in the preprocessing pipeline. Maybe one sample in thousands would show odd artifacts but in your entire cohort? That is indicative of a general processing problem. How did you do gene quantification?

ADD REPLY
0
Entering edit mode

Hi, I am facing a similar problem with my DESeq analysis. Large count numbers were introduced in the matrix when correcting for batch effect using Combatseq. I cannot create the Deseq object and am getting the error due to NAs introduced by large counts in the count matrix. How can I rectify this situation?

ADD REPLY
1
Entering edit mode

I don't actually recommend modifying the count values for batch correction, but instead using RUV or SVA factors in the design (see workflow).

The fact that count values over a billion were created isn't reassuring.

ADD REPLY

Login before adding your answer.

Traffic: 603 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6