Search
Question: Pairwise Comparison between A Large Number of Groups
0
gravatar for john.ouyang
10 months ago by
Singapore
john.ouyang0 wrote:

Hi all,

I am trying to identify genes that are specific to a cell type of interest by comparison with a large number of other cell types. To this end, I performed differential gene expression using DESeq2 using the following:

dds = DESeqDataSetFromMatrix(countData = as.matrix(dtCountsDDS),
                             colData = DataFrame(dtSampleDDS),
                             design = ~group)

where "group" goes from 0-48 with 0 is the group with the cell type of interest and 1-48 gives 48 other cell types which I am comparing to. Thus, the coldata looks like this:

description group
cell_type_of_interest_rep1 0
cell_type_of_interest_rep2 0
cell_type_of_interest_rep3 0
other_cell_type01_rep1 1
other_cell_type01_rep2 1
other_cell_type01_rep3 1
other_cell_type02_rep1 2
other_cell_type02_rep2 1
...
other_cell_type48_rep1 48
other_cell_type48_rep2 48
other_cell_type48_rep3 48

I then performed DESeq2 as follows:

dds = DESeq(dds, quiet = FALSE, betaPrior = FALSE, parallel = TRUE)

After which, I retrived the pairwise Wald comparison results for group0-vs-group1, group0-vs-group2, ..., group0-vs-group48 to find genes that are differentially expressed in the cell type of interest by the following:

res = results(dds, alpha = 0.05, independentFiltering = TRUE, contrast = c("group", 0, i))

where "i" goes from 1 to 48.

From the plotCounts, I am expecting the gene NANOG to be differentially expressed against most cell types, which is evident from the plotCounts.

https://imgur.com/5RPgDos

However, the DESeq2 results only gives an FDR of 0.5-0.6 in all cases. 

groupID geneID baseMean LFC lfcSE stat pvalue FDR
1 ENSG00000111704 9.399662 10.6784175 14.00173 0.76265003 0.4456721 0.5687196
2 ENSG00000111704 9.399662 8.7716355 14.00027 0.62653347 0.5309651 0.6534532
3 ENSG00000111704 9.399662 9.4506731 14.00173 0.67496482 0.4996981 0.633392
4 ENSG00000111704 9.399662 9.3446924 14.00173 0.66739571 0.5045194 0.6249148
5 ENSG00000111704 9.399662 10.3797534 14.00173 0.74131952 0.4584997 0.592654
6 ENSG00000111704 9.399662 10.6885146 14.00173 0.76337116 0.4452421 0.5774063
7 ENSG00000111704 9.399662 11.5510343 14.00173 0.82497212 0.4093874 0.565289
8 ENSG00000111704 9.399662 10.5822928 14.00173 0.75578484 0.4497782 0.5608032
9 ENSG00000111704 9.399662 9.5290049 14.00173 0.68055926 0.4961504 0.6209838
10 ENSG00000111704 9.399662 9.0412122 15.67196 0.57690359 0.5640046 0.6806634
11 ENSG00000111704 9.399662 10.8099388 14.00173 0.77204326 0.4400888 0.5565791
12 ENSG00000111704 9.399662 10.5248998 14.00173 0.75168584 0.45224 0.5566365
13 ENSG00000111704 9.399662 10.1456759 14.00173 0.72460176 0.4686964 0.5728484
14 ENSG00000111704 9.399662 8.7660278 14.00173 0.62606762 0.5312706 0.6353354
15 ENSG00000111704 9.399662 10.7080225 14.00173 0.76476442 0.4444118 0.5620308
16 ENSG00000111704 9.399662 9.3813843 14.00173 0.67001624 0.5028474 0.6240379
17 ENSG00000111704 9.399662 10.4812734 15.67196 0.66879132 0.5036286 0.6294616
18 ENSG00000111704 9.399662 10.9714755 14.00173 0.78358017 0.4332865 0.5349624
19 ENSG00000111704 9.399662 9.6212037 13.99186 0.68762874 0.4916866 0.6054409
20 ENSG00000111704 9.399662 8.0895122 13.97886 0.57869604 0.5627943 0.670864
21 ENSG00000111704 9.399662 9.6592287 14.00173 0.68985981 0.4902824 0.5920907
22 ENSG00000111704 9.399662 10.9413107 14.00173 0.7814258 0.4345521 0.5801206
23 ENSG00000111704 9.399662 9.354374 13.99711 0.66830755 0.5039373 0.6267258
24 ENSG00000111704 9.399662 7.7500671 13.98033 0.55435526 0.5793357 0.6769946
25 ENSG00000111704 9.399662 9.9603945 14.00173 0.711369 0.4768556 0.6080437
26 ENSG00000111704 9.399662 10.2005031 14.00173 0.7285175 0.4662969 0.6046647
27 ENSG00000111704 9.399662 9.1613179 15.67196 0.58456731 0.5588387 0.6709067
28 ENSG00000111704 9.399662 10.3497801 14.00173 0.73917883 0.4597984 0.5887615
29 ENSG00000111704 9.399662 8.929098 14.00173 0.63771405 0.5236598 0.6456075
30 ENSG00000111704 9.399662 11.1910169 14.00173 0.79925976 0.4241398 0.5865799
31 ENSG00000111704 9.399662 9.8669264 13.99732 0.70491541 0.4808629 0.6312367
32 ENSG00000111704 9.399662 10.4773122 14.00173 0.74828715 0.454287 0.7951142
33 ENSG00000111704 9.399662 -0.1575527 13.95228 -0.01129226 0.9909903 0.9973648
34 ENSG00000111704 9.399662 8.293104 14.00173 0.59229151 0.5536554 0.6722364
35 ENSG00000111704 9.399662 8.7458279 13.99514 0.62491901 0.5320242 0.6486637
36 ENSG00000111704 9.399662 7.8522695 14.00099 0.56083671 0.5749089 0.6777484
37 ENSG00000111704 9.399662 10.8459836 15.67196 0.69206282 0.4888979 0.6175993
38 ENSG00000111704 9.399662 10.390056 14.00173 0.74205533 0.4580538 0.5674383
39 ENSG00000111704 9.399662 8.6865408 14.00173 0.62039068 0.5350006 0.6595885
40 ENSG00000111704 9.399662 10.4439862 14.00173 0.74590701 0.4557236 0.5772359
41 ENSG00000111704 9.399662 9.8352453 14.00173 0.70243088 0.4824105 0.6273037
42 ENSG00000111704 9.399662 9.3964659 14.00173 0.67109335 0.5021611 0.629528
43 ENSG00000111704 9.399662 10.3623496 14.00173 0.74007654 0.4592536 0.5941857
44 ENSG00000111704 9.399662 8.3683002 15.67196 0.53396627 0.5933649 0.7084019
45 ENSG00000111704 9.399662 9.8391147 15.67196 0.62781632 0.5301243 0.6496046
46 ENSG00000111704 9.399662 10.4257006 14.00173 0.74460106 0.4565129 0.7593986
47 ENSG00000111704 9.399662 6.2449039 13.95703 0.44743778 0.654559 0.7649179
48 ENSG00000111704 9.399662 10.3231834 14.00173 0.7372793 0.4609525 0.5941485

Throughout this entire DESeq2 run, there is only a warning on the mean-dispersion fitting (reproduced below) and I think that is causing the lfcSE to be very large and consequently giving an insignificant FDR.

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

Thus, I have two questions:

  1. Is there anyway to remedy this apparently strange result?

  2. Is it sensible to do all the comparisons in one DESeq2 pass? Or should I perform 48 separate individual DESeq2 runs to compare the cell type of interest with each of the other cell types?

ADD COMMENTlink modified 10 months ago by Michael Love20k • written 10 months ago by john.ouyang0
1
gravatar for Michael Love
10 months ago by
Michael Love20k
United States
Michael Love20k wrote:

There's an issue with a default parameter here which disrupts estimation on this type of large design matrix and genes with many groups with all 0s. I'll work on a cleaner fix, but you should be able to use the following code instead of DESeq():

dds <- estimateSizeFactors(dds)
dds <- estimateDispersionsGeneEst(dds, minmu=1e-3)
dds <- estimateDispersionsFit(dds)
dds <- estimateDispersionsMAP(dds)
dds <- nbinomWaldTest(dds)

 

ADD COMMENTlink modified 10 months ago • written 10 months ago by Michael Love20k

Hi Michael,

Thanks for the prompt response. I ran my dataset using your code and NANOG is now DE with a FDR of 10^-8 to 10^-16.

However, I have two further questions:

  1. The calculation took a few hours as it is ran in serial. Is it possible to run this code in parallel instead?

  2. I took a close look at the results. For group 1-10, the counts of NANOG is 0. So I was expecting the comparison group0-vs-group1, group0-vs-group2, ..., group0-vs-group10 to yield similar results. However, the FDR still varies quite a bit. Is there any reason why?

bgID geneID baseMean LFC lfcSE stat pvalue FDR 
1 ENSG00000111704 9.399662 10.7242676 1.2364163 8.6736707 4.18E-18 1.06E-16 
2 ENSG00000111704 9.399662 8.8605577 1.224673 7.2350393 4.65E-13 8.55E-12 
3 ENSG00000111704 9.399662 9.4964033 1.2364163 7.6805874 1.58E-14 2.70E-13 
4 ENSG00000111704 9.399662 9.3904122 1.2364163 7.594863 3.08E-14 6.54E-13 
5 ENSG00000111704 9.399662 10.4255744 1.2364163 8.4320909 3.40E-17 1.11E-15 
6 ENSG00000111704 9.399662 10.7343657 1.2364163 8.681838 3.89E-18 1.07E-16 
7 ENSG00000111704 9.399662 11.5969698 1.2364163 9.3795027 6.63E-21 5.34E-19 
8 ENSG00000111704 9.399662 10.6281336 1.2364163 8.5959186 8.26E-18 1.08E-16 
9 ENSG00000111704 9.399662 9.5747427 1.2364163 7.7439475 9.64E-15 2.17E-13 
10 ENSG00000111704 9.399662 9.0873443 1.5024909 6.048186 1.46E-09 1.52E-08 
11 ENSG00000111704 9.399662 10.8558018 1.2364163 8.7800542 1.63E-18 2.86E-17 
12 ENSG00000111704 9.399662 10.570735 1.2364163 8.5494952 1.24E-17 1.24E-16 
13 ENSG00000111704 9.399662 10.191474 1.2364163 8.2427531 1.68E-16 1.42E-15 
14 ENSG00000111704 9.399662 8.8116911 1.2364163 7.1267997 1.03E-12 6.19E-12 
15 ENSG00000111704 9.399662 10.7538756 1.2364163 8.6976173 3.39E-18 5.16E-17 
16 ENSG00000111704 9.399662 9.4271077 1.2364163 7.624542 2.45E-14 4.02E-13 
17 ENSG00000111704 9.399662 10.5276166 1.5024909 7.0067758 2.44E-12 3.59E-11 
18 ENSG00000111704 9.399662 11.0173544 1.2364163 8.9107161 5.07E-19 7.92E-18 
19 ENSG00000111704 9.399662 9.6018307 1.0890851 8.8164195 1.18E-18 2.28E-17 
20 ENSG00000111704 9.399662 8.0471108 0.9162048 8.7830914 1.59E-18 3.18E-17 
21 ENSG00000111704 9.399662 9.7049792 1.2364163 7.8492814 4.18E-15 7.02E-14 
22 ENSG00000111704 9.399662 10.9871866 1.2364163 8.8863167 6.32E-19 3.26E-17 
23 ENSG00000111704 9.399662 9.3334518 1.1616469 8.0346718 9.38E-16 2.38E-14 
24 ENSG00000111704 9.399662 7.7116913 0.9169178 8.4104504 4.08E-17 6.72E-16 
25 ENSG00000111704 9.399662 10.0061745 1.2364163 8.0928848 5.83E-16 1.55E-14 
26 ENSG00000111704 9.399662 10.2463065 1.2364163 8.287101 1.16E-16 3.53E-15 
27 ENSG00000111704 9.399662 9.2074675 1.5024909 6.1281354 8.89E-10 4.37E-09 
28 ENSG00000111704 9.399662 10.3955982 1.2364163 8.4078465 4.18E-17 1.41E-15 
29 ENSG00000111704 9.399662 8.9747772 1.2364163 7.2587019 3.91E-13 7.30E-12 
30 ENSG00000111704 9.399662 11.2369172 1.2364163 9.0882961 1.01E-19 6.80E-18 
31 ENSG00000111704 9.399662 9.6574678 1.0941152 8.8267378 1.08E-18 4.20E-17 
32 ENSG00000111704 9.399662 10.5231428 1.2364163 8.5110031 1.72E-17 6.24E-16 
33 ENSG00000111704 9.399662 -0.1115104 0.3811911 -0.2925314 7.70E-01 9.00E-01 
34 ENSG00000111704 9.399662 8.338721 1.2364163 6.7442666 1.54E-11 2.39E-10 
35 ENSG00000111704 9.399662 9.0490757 1.1711452 7.7266895 1.10E-14 1.92E-13 
36 ENSG00000111704 9.399662 7.8792998 1.2254494 6.4297227 1.28E-10 1.25E-09 
37 ENSG00000111704 9.399662 10.8923803 1.5024909 7.2495484 4.18E-13 7.21E-12 
38 ENSG00000111704 9.399662 10.435878 1.2364163 8.4404244 3.16E-17 4.18E-16 
39 ENSG00000111704 9.399662 8.7321963 1.2364163 7.0625051 1.64E-12 2.67E-11 
40 ENSG00000111704 9.399662 10.4898135 1.2364163 8.4840468 2.17E-17 6.22E-16 
41 ENSG00000111704 9.399662 9.8810131 1.2364163 7.9916557 1.33E-15 7.55E-14 
42 ENSG00000111704 9.399662 9.4421907 1.2364163 7.6367409 2.23E-14 4.92E-13 
43 ENSG00000111704 9.399662 10.4081689 1.2364163 8.4180135 3.83E-17 1.12E-15 
44 ENSG00000111704 9.399662 8.4143335 1.5024909 5.600256 2.14E-08 1.59E-07 
45 ENSG00000111704 9.399662 9.8853638 1.5024909 6.579317 4.73E-11 5.35E-10 
46 ENSG00000111704 9.399662 10.4715261 1.2364163 8.4692561 2.47E-17 4.28E-16 
47 ENSG00000111704 9.399662 6.3246837 0.5284475 11.968423 5.20E-33 4.34E-31 
48 ENSG00000111704 9.399662 10.3689989 1.2364163 8.3863332 5.02E-17 1.83E-15
ADD REPLYlink modified 10 months ago • written 10 months ago by john.ouyang0

The differences in p-values are due to differences in the size factor. A "0" is not exactly a "0" due to the offset from sequencing depth. You shouldn't really interpret a difference between FDR of 1e-10 or 1e-12, this isn't giving you extra information. The question is really just, for an 'alpha' of interest to you, is padj < alpha or not.

In the next version of DESeq2 (1.20), you can just specify minmu directly to DESeq().

For now you can use a simple wrapper that uses BiocParallel directly:

dds <- makeExampleDESeqDataSet()

library(BiocParallel)
nworkers <- 3
register(MulticoreParam(nworkers))
# this makes a partition of the genes to send to diff workers
idx <- factor(sort(rep(seq_len(nworkers),length.out=nrow(dds))))

dds <- estimateSizeFactors(dds)

minmu <- 1e-3

# split and then bind together along partition
dds <- do.call(rbind, bplapply(levels(idx), function(l) {
    estimateDispersionsGeneEst(dds[idx == l,], minmu=minmu)
}))

# this cannot be run in parallel, needs to see all genes
dds <- estimateDispersionsFit(dds)

# these two steps can be combined into one split and bind operation
dds <- do.call(rbind, bplapply(levels(idx), function(l) {
  d <- estimateDispersionsMAP(dds[idx == l,])
  nbinomWaldTest(d)
}))

# this is fast, does not benefit from parallel, 
# if you only compare to reference level
res <- results(dds)
ADD REPLYlink modified 10 months ago • written 10 months ago by Michael Love20k

Thanks for the quick wrapper solution. And I am looking forward to the next version of DESeq2. :)

ADD REPLYlink written 10 months ago by john.ouyang0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 289 users visited in the last hour