Question: DESeq2/SAMSeq - analysing subset of genes from RNASeq dataset
0
gravatar for aylj
3.6 years ago by
aylj0
aylj0 wrote:

Hello all -

We've recently been looking at some of the abundant RNASeq data from the NIH's cancer genome atlas to find differentially expressed genes in biological replicates from paired normal/tumour samples, different tumours, etc. We've had success using DESeq2 and SAMSeq on raw count data, and the high sample number has been giving low FDRs/q values.

We're only interested in a small set (~30) of genes of interest vary between conditions. I was wondering, then, what are the statistical pitfalls of excluding other genes before input to DESeq2, SAMSeq etc? I'm aware that these apply normalisation which takes into account reads across all genes. Is there any other part of the DE analysis that might be thrown off by this? On the other hand, is there anything to be said for reducing the number of multiple comparisons being performed? Or is this just generally a bad idea?

(there are some peripheral benefits to excluding the genes, including easier data extraction and less processing time over hundreds of samples, at least in DESeq2).

Thanks in advance! Jon

ADD COMMENTlink modified 3.6 years ago by Michael Love25k • written 3.6 years ago by aylj0
Answer: DESeq2/SAMSeq - analysing subset of genes from RNASeq dataset
4
gravatar for Michael Love
3.6 years ago by
Michael Love25k
United States
Michael Love25k wrote:

For DESeq2, the normalization, dispersion estimation and the width of the prior on LFC all are designed for you to provide all the genes (thousands of rows over which to learn parameters).

When it comes to testing, if you are truly setting aside only 30 genes for testing beforehand, and you can write down this list in stone, you could just correct the p-values for these genes, by subsetting and manually using p.adjust(). If you look at the p-values and then change your mind about which genes to test, this kind of an "adaptive" testing regime would lose type I error control.

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Michael Love25k

Hi Michael, thanks for clearing that up. Ok, no problem, that does make sense - since the data is available there doesn't seem like a good reason to exclude it, and I can see why it would be necessary to to specify those parameters. Thanks!
 

ADD REPLYlink written 3.6 years ago by aylj0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 412 users visited in the last hour