Normalisation of data using DESeq2 with multi-groups
1
1
Entering edit mode
Changsuk ▴ 10
@849f1a02
Last seen 19 months ago
South Korea

Dear the bioinformaticians,

I am using DESeq2 to normalize data from 100 samples from 100 patients. The 100 samples are classified into 3 groups (group A; 30 samples, group. B: 30 samples, group C: 40 samples). I want to compare significance of gene expression among each group. In statistics, we use ANOVA, but in DESeq2 there is no option to compare gene expression among more than three groups.

  1. So, my first question is "Is it ok to 1) obtain the normalization scale factors using 100 samples, 2) multiply normalisation scale factors to each one, and 3) compare the gene exrpression among three groups.". If not, is there any suggested way to compare multi group analysis?
  1. To compare group A and group B, is it OK to use the normalised counts using all 100 samples? Or should I run DESeq2 with only group A and group B (30+30 = 60 samples). Regarding the concept of DESeq2, both methods (calculating scale factors from 100 samples and 60 samples) would not show big difference.

Thank you for your comments and advice.


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )
DEseq2 • 2.9k views
ADD COMMENT
3
Entering edit mode
ATpoint ★ 4.5k
@atpoint-13662
Last seen 2 days ago
Germany

The 100 samples are classified into 3 groups (group A; 30 samples, group. B: 30 samples, group C: 40 samples). I want to compare significance of gene expression among each group.

If that is the case then analyse them together unless any of the arguments from the FAQ in the vignette keeps you from doing so:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#if-i-have-multiple-groups-should-i-run-all-together-or-split-into-pairs-of-groups

In statistics, we use ANOVA, but in DESeq2 there is no option to compare gene expression among more than three groups.

Not true. You can compare any number of groups. Either use the LRT, for example with a design ~group compared to a reduced ~1 to get genes that have any change in any group. Or use the default Wald test for pairwise comparisons, for example a given group versus the average of the rest as described here:

DESeq2: Comparing one sample with the mean of all

So, my first question is "Is it ok to 1) obtain the normalization scale factors using 100 samples, 2) multiply normalisation scale factors to each one, and 3) compare the gene exrpression among three groups."

I do not understand what 2) is in particular. If you want to normalize them with DESeq2 then run estimateSizeFactors for the normalization and extract normalized counts with counts(dds, normalized=TRUE) on normal scale or normTransform(dds) on log2 scale. The normalization is part of the DESeq() function so you can extract counts after running that as well. The testing is done internally, see vignette, there is (for DE analysis) no need to ever fiddle with the normalized counts explicitely. See vignette, it covers all this. Don't do custom manipulation of counts unless you precisely know what you do and what the implications are.

To compare group A and group B, is it OK to use the normalised counts using all 100 samples? Or should I run DESeq2 with only group A and group B (30+30 = 60 samples). Regarding the concept of DESeq2, both methods (calculating scale factors from 100 samples and 60 samples) would not show big difference.

That again relates to the FAQ from the vignette (first link above). it is probably similar, and unless there is a good argument (see FAQ) to separate the groups I would run together to keep it simple.

ADD COMMENT
0
Entering edit mode

I appreciate your kind reply. All my questions were solved. Thank you very much again.

ADD REPLY

Login before adding your answer.

Traffic: 748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6