DESeq2 showing little if none significant DE genes (none if considering p-adj)
1
0
Entering edit mode
wariobrega • 0
@wariobrega-9755
Last seen 7.4 years ago

Hello everyone,

I'm Daniele, a junior researcher and former participant of the CSAMA event in 2015. After some month of waiting, I finally received my first RNA-Seq data, so I wanted to test DESeq2 to perform my DGE analysis.

What I'm trying to do is to find differentially expressed features at gene level on a series of tumor cell lines expressing two isoforms of a particular gene, and no isoforms (control). Moreover, I also have knock-out tumor cell lines for these isoforms. Finally, all these lines where exposed to a drug, and the relative controls  (no treatment) where sampled. I have at least 2 biological replicate for each cell condition.

I am trying to compare all the combinations of these cell lines in order to obtain all possible information regarding the significant DEGenes in this experiment (e.g.: cells with isoform1 treated vs cell with isoform 1 untreated, all tumor cell treated vs all tumor cells untreated, all knockdown untreated cells for isoform 1 vs all knockdown untreated cells for isoform 2, and so on).  So, since there are MUCH more factors than the ones the "contrast" function in the results() function allows to, I builded my DESeqDataset using a combination of these factors.

(Briefly I paste()d the columns specifying treatment, knockdown and so on and then used them to model the dataset. eg:

sampleinfo$contrasts <-  as.factor(paste(sampleinfo$isoform, sampleinfo$knockout, sampleinfo$treatment ,sep = "_")

#create a factor like isoform1_noknockout_untreated)

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleinfo, directory = ".",design = ~contrasts)


And then I proceeded with my DESeq2 analysis following the RNA-seq workflow found in Bioconductor.

What I found is that little or no significant DEgenes where found using DeSeq2 with default parameters. Moreover, when looking at the adjusted po-valuue, ALL values are set to 1 Just to output some of the DeSeq2 results:

gene_name    baseMean    log2FoldChange    lfcSE    stat    pvalue    padj
A1BG    32.6061798000319    -0.233823005166891    0.233034876711654    -1.00338201931984    0.315676576201699    1
A1BG-AS1    0.701182193651673    -0.036636625854256    0.0819481333750818    -0.447070901378166    0.654823868155052    1
A1CF    1475.624054965    -0.0140027561993027    0.218784550709677    -0.0640024908243367    0.948968243485277    1
A2M    51012.7388481331    -0.0452381081136272    0.264470450676274    -0.171051654345313    0.864183149758649    1

 

So, several questions arises on how these findings should be interpreted and I am asking for your support in order to solve them. The questions are:
 

  1. does the choice of "pasting" factors influence the DEseq analysis?
  2. Should I use different parameters for my DESeq analysis?
  3. Should I use the LR method for statistical testing rather than Wald? what are the pros and cons of the two approaches?
  4. Are these findings maybe dependant from the low number of replicates of my samples? What would you suggest to restrict the use of DESeq to "supergroups" rather than these small comparisons?

I have to mention that this is my first RNA-SEq analysis so I apologise in advance for any mistake I could have committed.

Thanks a lot for your time!

Daniele

 

deseq2 multiple factor design • 2.1k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

hi Daniele,

1-2 There are not limitations from the factor(paste()) approach to generating unique groups (combinations of several variables) to compare across using contrasts.

3 - There is not much of a difference. Wald in DESeq2 is used to compare groups, while the LRT is used to compare two model (basically two design formula). So Wald is appropriate here.

4 - Yes, 2 is likely too few replicates to observe statistically significant differences. You can however rank the genes by res$log2FoldChange for exploration, but you will probably need more replicates to find statistical significant differences. Typically 3 is the minimum suggested sample size, with 3-5 being a typical number of biological replicates.

ADD COMMENT

Login before adding your answer.

Traffic: 992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6