Entering edit mode
                    Stefan Dentro
        
    
        ▴
    
    70
        @stefan-dentro-5471
        Last seen 11.2 years ago
        
    Hello,
I'm using DEXSeq to obtain differentially expressed exons between my
15
samples and 8 controls. But for some reason most exons are not
testable and
are not assigned a p-value. Do you know what I'm doing wrong?
First I've created a conservative list of exons per gene in the human
genome through the canonical transcripts table in UCSC (194511 in
total).
Next I've obtained read counts for each exon in each sample which are
read
into one big data.frame. Then the following list of commands are
executed:
metadata = data.frame(
  row.names=colnames(dat)[7:29],
  condition=c(rep("control", 8), rep("data", 15)),
  replicate=c(c(1:8), c(1:15)),
  libType=rep("paired-end", 23))
## Add strand information that is not available in the read counts
data.frame
dat = cbind(dat, strand=rep(NA, nrow(dat)))
cds = newExonCountSet(countData = dat[,c(7:29)], design = metadata,
                       geneIDs = dat[,5], exonIDs = dat[,6],
                       exonIntervals=dat[,c(2,3,4, 30)])
cds = estimateSizeFactors(cds)
cds = estimateDispersions(cds, minCount=2, maxExon=90)
cds = fitDispersionFunction(cds)
cds = testForDEU(cds)
cds = estimatelog2FoldChanges(cds)
Now lets see how many exons are expressed significantly different
between
data and controls. Surprisingly only 1850 exons are described here,
not the
full 194511:
res1 = DEUresultTable(cds)
table(res1$padjust < 0.05)
FALSE  TRUE
 1424   426
So I've zoomed in to an example gene. Below it can be seen that indeed
all
exons are marked as not testable. But looking at the individual read
counts
per sample (see further below) it does not make sense that they are
not
testable. Some exons indeed have a low read count and are excluded
rightfully, but others have enough reads to base dispersion on I would
think.
If I have understood correctly testable can be false in either of
these
cases:
- Gene has only one exon, dispersion cannot be calculated.
- Readcounts for an exon are too low.
- A combination of the above.
But all do not apply here. Any ideas?
fData for the gene:
                geneID exonID testable dispBeforeSharing dispFitted
dispersion pvalue padjust chr    start      end strand transcripts
136687 ENSG00000009780      1    FALSE                NA  0.2026585
0.2026585     NA      NA   1 28052568 28052672   <na> <na>
108973 ENSG00000009780      2    FALSE                NA  0.9549670
0.9549670     NA      NA   1 28053983 28054047   <na> <na>
1100   ENSG00000009780      3    FALSE                NA  1.9560912
1.9560912     NA      NA   1 28056743 28056844   <na> <na>
1      ENSG00000009780      4    FALSE                NA  0.4722995
0.4722995     NA      NA   1 28059114 28059168   <na> <na>
22096  ENSG00000009780      5    FALSE                NA  0.1146813
0.1146813     NA      NA   1 28060542 28060694   <na> <na>
13670  ENSG00000009780      6    FALSE                NA  0.1127563
0.1127563     NA      NA   1 28071165 28071322   <na> <na>
13666  ENSG00000009780      7    FALSE                NA  0.1450013
0.1450013     NA      NA   1 28075579 28075665   <na> <na>
22095  ENSG00000009780      8    FALSE                NA  0.1160550
0.1160550     NA      NA   1 28081706 28081841   <na> <na>
13665  ENSG00000009780      9    FALSE                NA  0.1129432
0.1129432     NA      NA   1 28086037 28086138   <na> <na>
138629 ENSG00000009780     10    FALSE                NA  0.1112855
0.1112855     NA      NA   1 28087006 28087092   <na> <na>
Read counts for the gene per sample:
               exon_id chr    start      end         gene_id exon_nr
sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8
1      ENSE00000327880   1 28059114 28059168 ENSG00000009780 4
11       8       4       7       3       1       8       3
1100   ENSE00000571221   1 28056743 28056844 ENSG00000009780 3
1       0       1       1       0       0       0       1
13665  ENSE00000761751   1 28086037 28086138 ENSG00000009780 9      91
145      86      47     169      36     231     136
13666  ENSE00000761753   1 28075579 28075665 ENSG00000009780 7      19
62      32      18      44      23      61      38
13670  ENSE00000761780   1 28071165 28071322 ENSG00000009780 6      77
139      73      66      78      30     168      88
22095  ENSE00000866714   1 28081706 28081841 ENSG00000009780 8      78
94      52      47      40      41     121      60
22096  ENSE00000866716   1 28060542 28060694 ENSG00000009780 5      36
61      50      63      67      10     129      69
108973 ENSE00001625313   1 28053983 28054047 ENSG00000009780 2
3       5       3       0       1       0       4       2
136687 ENSE00001909353   1 28052568 28052672 ENSG00000009780 1       2
12       5       0      19       0      34      10
138629 ENSE00001955917   1 28087006 28087092 ENSG00000009780 10
42      96      50      24      98      23     117      79
       sample9 sample10 sample11 sample12 sample13 sample14 sample15
sample16 sample17 sample18 sample19 sample20 sample21 sample22
1            1        1        0        3        4        1 1
0        3        7        0        0        1        0
1100         2        0        0        0        0        1 1
0        0        3        1        0        0        0
13665       78       64      130       55       66       27 75
53       49      126       36       37       48       53
13666        7       20       51        6       43        8 15
19       11       26        7       15       11        5
13670       75       85      186       79       86       58 70
86       47      149       82       49       66       38
22095       64       64      149       51       91       37 67
54       58      148       54       41       64       43
22096       90       79      287       66       97       29 75
92       58      126       67       55       71       51
108973       1        0        0        1        1        0 1
1        0        4        0        0        0        0
136687       5       28       31        6       27        5 4
12        5       32        6        7        9       16
138629     119      111      233      100      129       59 141
103       70      219       92       60       74       67
       sample23 strand
1             0     NA
1100          0     NA
13665        24     NA
13666         7     NA
13670        27     NA
22095        25     NA
22096        21     NA
108973        1     NA
136687        1     NA
138629       54     NA
Design of the experiment:
         condition replicate    libType
sample1    control         1 paired-end
sample2    control         2 paired-end
sample3    control         3 paired-end
sample4    control         4 paired-end
sample5    control         5 paired-end
sample6    control         6 paired-end
sample7    control         7 paired-end
sample8    control         8 paired-end
sample9       data         1 paired-end
sample10      data         2 paired-end
sample11      data         3 paired-end
sample12      data         4 paired-end
sample13      data         5 paired-end
sample14      data         6 paired-end
sample15      data         7 paired-end
sample16      data         8 paired-end
sample17      data         9 paired-end
sample18      data        10 paired-end
sample19      data        11 paired-end
sample20      data        12 paired-end
sample21      data        13 paired-end
sample22      data        14 paired-end
sample23      data        15 paired-end
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i686-pc-linux-gnu (32-bit)
locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C LC_TIME=en_GB.UTF-8
LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8
 [6] LC_MESSAGES=C              LC_PAPER=C LC_NAME=C
LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base
other attached packages:
[1] plyr_1.7.1          DEXSeq_1.2.1        BiocInstaller_1.4.3
DESeq_1.8.3         locfit_1.5-7        Biobase_2.16.0
[7] BiocGenerics_0.2.0
loaded via a namespace (and not attached):
 [1] annotate_1.34.0      AnnotationDbi_1.18.0 biomaRt_2.12.0
DBI_0.2-5            genefilter_1.38.0    geneplotter_1.34.0
 [7] grid_2.15.0          hwriter_1.3          IRanges_1.14.2
lattice_0.20-6       RColorBrewer_1.0-5   RCurl_1.91-1
[13] RSQLite_0.11.1       splines_2.15.0       statmod_1.4.15
stats4_2.15.0        stringr_0.6          survival_2.36-12
[19] tools_2.15.0         XML_3.9-4            xtable_1.7-0
        [[alternative HTML version deleted]]
                    
                
                