Question

Using DESeq2 on CEL-Seq data

0

Entering edit mode

solgakar@bi.technion.ac.il ▴ 90

@solgakarbitechnionacil-6453

Last seen 7.1 years ago

European Union

Hello all,

I am working on CEL-Seq data, which is a protocol that allows working with small starting amounts of RNA.

Most of the CEL-Seq pipeline is similar to the RNA-Seq pipeline, but a big difference between CEL-Seq and RNA-Seq data is that after the counting step (using HTSeq-count or a modified version of this tool in order to collapse reads that originate from a single transcript, using Unique Molecular Identifier), the amount of reads that are counted to features is much lower.

After performing the collapsing, we might receive around 100,000 reads per sample or even less, which is of course much lower than the amount of reads usually counted in regular RNA-Seq data.

I wanted to ask if this kind of data and such low amount of reads could be used to perform differential gene expression testing using DESeq2? If so, are there any modification to the normalization method or any other steps in the workflow of DESeq2 that i should consider?

Thank you very much,

Olga Karinky.

DESeq2 • 2.1k views

ADD COMMENT • link updated 9.5 years ago by Michael Love 41k • written 9.5 years ago by solgakar@bi.technion.ac.il ▴ 90

score 0 · Answer 1 · 2014-10-22

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 3 hours ago

United States

Yes the inference automatically adjusts when counts are low. I've looked at UMI counts which tend to have not as high dispersion as RNA-Seq counts, and I think the Negative Binomial GLM can be used on these.

ADD COMMENT • link 9.5 years ago Michael Love 41k

0

Entering edit mode

Dear Michael,

I am trying to use DESeq2 on UMI count data. In my case, the counts are extremely low. Due to the design of the experiment, I end up with around 30,000 total counts per sample (coming from around 2 million reads). If I substitute counts for reads as input, this causes problems because the counts per DNA fragment of interest are so low (range 1-100).

My idea was to divide each UMI count by the total number of counts and multiply this by the total number of reads for the sample before inputting to DESeq2. Do you think this is a valid approach?

ADD REPLY • link 8.4 years ago a.koe • 0

0

Entering edit mode

The counts being low is not a problem. If the differences across condition rise above the expected sampling variance and the estimated extra variance (overdispersion), then you will be sensitive to detect changes.

ADD REPLY • link 8.4 years ago Michael Love 41k

0

Entering edit mode

Is it possible to combine raw counts from CEL-Seq and raw counts from a "regular RNA-Seq" experiment in DESeq2? I guess the CEL-Seq counts will be scaled up a lot?

ADD REPLY • link 8.1 years ago Jon Bråte ▴ 250

1

Entering edit mode

For looking for differences in a ratio across conditions, scaling doesn't matter, see my other comment in this thread.

ADD REPLY • link 8.1 years ago Michael Love 41k