Question

Can DESEQ2 be used to determine DE from targeted RNA-Seq?

0

Entering edit mode

Seanna.McTaggart • 0

@seannamctaggart-10864

Last seen 7.9 years ago

Hello;

I am running an 50 gene assay using targeted RNA-seq to verify results from microarray data. Thus, all of the genes are expected to be differentially expressed between samples from two conditions, and will have a lower range of expression than one would normally see in whole transcriptome sequencing. We can of course include genes that are not differentially expressed as well, although I am not clear on how many (if any), would be necessary to normalize for library size. I am in the process of setting up the analysis workflow and was wondering if DESEQ2 would be appropriate?

Many thanks for any suggestions.

Seanna

deseq2 • 857 views

ADD COMMENT • link 7.9 years ago Seanna.McTaggart • 0

0

Entering edit mode

Thank-you Michael for your quick reply and for your tips on how to specify the control genes to guide the library size normalization and fit the model. It will be interesting to test this method against the BaseSpace app.

Cheers!

Seanna

ADD REPLY • link 7.9 years ago Seanna.McTaggart • 0

score 1 · Answer 1 · 2016-06-09

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 9 hours ago

United States

You can use DESeq2, but you'll need to come up with a way to normalize for library size, either using a set of genes that are not expected to be differentially expressed or some other external controls. If you just provide DESeq2 with the set of 50 known DE genes, then it is likely that the DE signal will be absorbed by the size factor normalization (depends on the distribution of up/down fold changes), or in any case such an input cannot be trusted to give reasonable results.

You can use estimateSizeFactors() before running DESeq() and specify controlGenes.

Or you can estimate size factors using some custom method and supply these via sizeFactors(dds) <- x, before running DESeq().

With so few rows, I would suggest fitType="mean" when running DESeq() rather than the other fit types, as it is a simpler dispersion~mean relationship to fit.

ADD COMMENT • link 7.9 years ago Michael Love 41k

0

Entering edit mode

Hi Seanna and Michael.

I am facing a similar issue at the moment, my datasets consist of a RNA-Seq panel of 400 genes of which 11 are housekeeping for which I have about ~200 samples

Michael, Do you consider specifying the indices of those 11 to controlGenes as a good idea? Also, should I set fitType="mean" or can I proceed with default parameters? I know 400 genes are not many but still quite more than 50...

An important issue I have noticed is that not always those 11 genes are the lowest in variance, in fact I could tell some others more consistent in expression...should I specify those to controlGenes instead?

Thank you,

David.

ADD REPLY • link 4.1 years ago dagsbio • 0

0

Entering edit mode

We have used DESeq2 with a similar setup in my lab, with Nanostring counts for ~400 genes on a panel plus a dozen "housekeeping" genes. I'd prefer more genes that could be evaluated for "housekeeping" potential, but the above procedure is what I'd recommend.

Re: fitType, you can try both and examine the plotDispEsts plot. DESeq2's default is the "parametric" but the concern is that there may be too few to reliably estimate the parametric curve well and "mean" is trivial for DESeq2 to compute.

Re: low variance, this is tricky. The true variance depends on normalization, which depends on definition of housekeeping. This is not a trivial problem and not necessarily identifiable from data without prior knowledge of which genes may not change much across the biological condition.

ADD REPLY • link 4.1 years ago Michael Love 41k