Question

DESeq2 showing little if none significant DE genes (none if considering p-adj)

0

Entering edit mode

wariobrega • 0

@wariobrega-9755

Last seen 6.8 years ago

Hello everyone,

I'm Daniele, a junior researcher and former participant of the CSAMA event in 2015. After some month of waiting, I finally received my first RNA-Seq data, so I wanted to test DESeq2 to perform my DGE analysis.

What I'm trying to do is to find differentially expressed features at gene level on a series of tumor cell lines expressing two isoforms of a particular gene, and no isoforms (control). Moreover, I also have knock-out tumor cell lines for these isoforms. Finally, all these lines where exposed to a drug, and the relative controls (no treatment) where sampled. I have at least 2 biological replicate for each cell condition.

I am trying to compare all the combinations of these cell lines in order to obtain all possible information regarding the significant DEGenes in this experiment (e.g.: cells with isoform1 treated vs cell with isoform 1 untreated, all tumor cell treated vs all tumor cells untreated, all knockdown untreated cells for isoform 1 vs all knockdown untreated cells for isoform 2, and so on). So, since there are MUCH more factors than the ones the "contrast" function in the results() function allows to, I builded my DESeqDataset using a combination of these factors.

(Briefly I paste()d the columns specifying treatment, knockdown and so on and then used them to model the dataset. eg:

sampleinfo$contrasts <- as.factor(paste(sampleinfo$isoform, sampleinfo$knockout, sampleinfo$treatment ,sep = "_")

#create a factor like isoform1_noknockout_untreated)

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleinfo, directory = ".",design = ~contrasts)

And then I proceeded with my DESeq2 analysis following the RNA-seq workflow found in Bioconductor.

What I found is that little or no significant DEgenes where found using DeSeq2 with default parameters. Moreover, when looking at the adjusted po-valuue, ALL values are set to 1 Just to output some of the DeSeq2 results:

gene_name   baseMean   log2FoldChange   lfcSE   stat   pvalue   padj
A1BG   32.6061798000319   -0.233823005166891   0.233034876711654   -1.00338201931984   0.315676576201699   1
A1BG-AS1   0.701182193651673   -0.036636625854256   0.0819481333750818   -0.447070901378166   0.654823868155052   1
A1CF   1475.624054965   -0.0140027561993027   0.218784550709677   -0.0640024908243367   0.948968243485277   1
A2M   51012.7388481331   -0.0452381081136272   0.264470450676274   -0.171051654345313   0.864183149758649   1

So, several questions arises on how these findings should be interpreted and I am asking for your support in order to solve them. The questions are:

does the choice of "pasting" factors influence the DEseq analysis?
Should I use different parameters for my DESeq analysis?
Should I use the LR method for statistical testing rather than Wald? what are the pros and cons of the two approaches?
Are these findings maybe dependant from the low number of replicates of my samples? What would you suggest to restrict the use of DESeq to "supergroups" rather than these small comparisons?

I have to mention that this is my first RNA-SEq analysis so I apologise in advance for any mistake I could have committed.

Thanks a lot for your time!

Daniele

deseq2 multiple factor design • 2.0k views

ADD COMMENT • link updated 8.2 years ago by Michael Love 41k • written 8.2 years ago by wariobrega • 0

score 1 · Answer 1 · 2016-02-25

hi Daniele,

1-2 There are not limitations from the factor(paste()) approach to generating unique groups (combinations of several variables) to compare across using contrasts.

3 - There is not much of a difference. Wald in DESeq2 is used to compare groups, while the LRT is used to compare two model (basically two design formula). So Wald is appropriate here.

4 - Yes, 2 is likely too few replicates to observe statistically significant differences. You can however rank the genes by res$log2FoldChange for exploration, but you will probably need more replicates to find statistical significant differences. Typically 3 is the minimum suggested sample size, with 3-5 being a typical number of biological replicates.