Search
Question: DEXSeq many exons marked as not testable
0
gravatar for Stefan Dentro
5.2 years ago by
Stefan Dentro70 wrote:
Hello, I'm using DEXSeq to obtain differentially expressed exons between my 15 samples and 8 controls. But for some reason most exons are not testable and are not assigned a p-value. Do you know what I'm doing wrong? First I've created a conservative list of exons per gene in the human genome through the canonical transcripts table in UCSC (194511 in total). Next I've obtained read counts for each exon in each sample which are read into one big data.frame. Then the following list of commands are executed: metadata = data.frame( row.names=colnames(dat)[7:29], condition=c(rep("control", 8), rep("data", 15)), replicate=c(c(1:8), c(1:15)), libType=rep("paired-end", 23)) ## Add strand information that is not available in the read counts data.frame dat = cbind(dat, strand=rep(NA, nrow(dat))) cds = newExonCountSet(countData = dat[,c(7:29)], design = metadata, geneIDs = dat[,5], exonIDs = dat[,6], exonIntervals=dat[,c(2,3,4, 30)]) cds = estimateSizeFactors(cds) cds = estimateDispersions(cds, minCount=2, maxExon=90) cds = fitDispersionFunction(cds) cds = testForDEU(cds) cds = estimatelog2FoldChanges(cds) Now lets see how many exons are expressed significantly different between data and controls. Surprisingly only 1850 exons are described here, not the full 194511: res1 = DEUresultTable(cds) table(res1$padjust < 0.05) FALSE TRUE 1424 426 So I've zoomed in to an example gene. Below it can be seen that indeed all exons are marked as not testable. But looking at the individual read counts per sample (see further below) it does not make sense that they are not testable. Some exons indeed have a low read count and are excluded rightfully, but others have enough reads to base dispersion on I would think. If I have understood correctly testable can be false in either of these cases: - Gene has only one exon, dispersion cannot be calculated. - Readcounts for an exon are too low. - A combination of the above. But all do not apply here. Any ideas? fData for the gene: geneID exonID testable dispBeforeSharing dispFitted dispersion pvalue padjust chr start end strand transcripts 136687 ENSG00000009780 1 FALSE NA 0.2026585 0.2026585 NA NA 1 28052568 28052672 <na> <na> 108973 ENSG00000009780 2 FALSE NA 0.9549670 0.9549670 NA NA 1 28053983 28054047 <na> <na> 1100 ENSG00000009780 3 FALSE NA 1.9560912 1.9560912 NA NA 1 28056743 28056844 <na> <na> 1 ENSG00000009780 4 FALSE NA 0.4722995 0.4722995 NA NA 1 28059114 28059168 <na> <na> 22096 ENSG00000009780 5 FALSE NA 0.1146813 0.1146813 NA NA 1 28060542 28060694 <na> <na> 13670 ENSG00000009780 6 FALSE NA 0.1127563 0.1127563 NA NA 1 28071165 28071322 <na> <na> 13666 ENSG00000009780 7 FALSE NA 0.1450013 0.1450013 NA NA 1 28075579 28075665 <na> <na> 22095 ENSG00000009780 8 FALSE NA 0.1160550 0.1160550 NA NA 1 28081706 28081841 <na> <na> 13665 ENSG00000009780 9 FALSE NA 0.1129432 0.1129432 NA NA 1 28086037 28086138 <na> <na> 138629 ENSG00000009780 10 FALSE NA 0.1112855 0.1112855 NA NA 1 28087006 28087092 <na> <na> Read counts for the gene per sample: exon_id chr start end gene_id exon_nr sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 1 ENSE00000327880 1 28059114 28059168 ENSG00000009780 4 11 8 4 7 3 1 8 3 1100 ENSE00000571221 1 28056743 28056844 ENSG00000009780 3 1 0 1 1 0 0 0 1 13665 ENSE00000761751 1 28086037 28086138 ENSG00000009780 9 91 145 86 47 169 36 231 136 13666 ENSE00000761753 1 28075579 28075665 ENSG00000009780 7 19 62 32 18 44 23 61 38 13670 ENSE00000761780 1 28071165 28071322 ENSG00000009780 6 77 139 73 66 78 30 168 88 22095 ENSE00000866714 1 28081706 28081841 ENSG00000009780 8 78 94 52 47 40 41 121 60 22096 ENSE00000866716 1 28060542 28060694 ENSG00000009780 5 36 61 50 63 67 10 129 69 108973 ENSE00001625313 1 28053983 28054047 ENSG00000009780 2 3 5 3 0 1 0 4 2 136687 ENSE00001909353 1 28052568 28052672 ENSG00000009780 1 2 12 5 0 19 0 34 10 138629 ENSE00001955917 1 28087006 28087092 ENSG00000009780 10 42 96 50 24 98 23 117 79 sample9 sample10 sample11 sample12 sample13 sample14 sample15 sample16 sample17 sample18 sample19 sample20 sample21 sample22 1 1 1 0 3 4 1 1 0 3 7 0 0 1 0 1100 2 0 0 0 0 1 1 0 0 3 1 0 0 0 13665 78 64 130 55 66 27 75 53 49 126 36 37 48 53 13666 7 20 51 6 43 8 15 19 11 26 7 15 11 5 13670 75 85 186 79 86 58 70 86 47 149 82 49 66 38 22095 64 64 149 51 91 37 67 54 58 148 54 41 64 43 22096 90 79 287 66 97 29 75 92 58 126 67 55 71 51 108973 1 0 0 1 1 0 1 1 0 4 0 0 0 0 136687 5 28 31 6 27 5 4 12 5 32 6 7 9 16 138629 119 111 233 100 129 59 141 103 70 219 92 60 74 67 sample23 strand 1 0 NA 1100 0 NA 13665 24 NA 13666 7 NA 13670 27 NA 22095 25 NA 22096 21 NA 108973 1 NA 136687 1 NA 138629 54 NA Design of the experiment: condition replicate libType sample1 control 1 paired-end sample2 control 2 paired-end sample3 control 3 paired-end sample4 control 4 paired-end sample5 control 5 paired-end sample6 control 6 paired-end sample7 control 7 paired-end sample8 control 8 paired-end sample9 data 1 paired-end sample10 data 2 paired-end sample11 data 3 paired-end sample12 data 4 paired-end sample13 data 5 paired-end sample14 data 6 paired-end sample15 data 7 paired-end sample16 data 8 paired-end sample17 data 9 paired-end sample18 data 10 paired-end sample19 data 11 paired-end sample20 data 12 paired-end sample21 data 13 paired-end sample22 data 14 paired-end sample23 data 15 paired-end > sessionInfo() R version 2.15.0 (2012-03-30) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 [6] LC_MESSAGES=C LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] plyr_1.7.1 DEXSeq_1.2.1 BiocInstaller_1.4.3 DESeq_1.8.3 locfit_1.5-7 Biobase_2.16.0 [7] BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] annotate_1.34.0 AnnotationDbi_1.18.0 biomaRt_2.12.0 DBI_0.2-5 genefilter_1.38.0 geneplotter_1.34.0 [7] grid_2.15.0 hwriter_1.3 IRanges_1.14.2 lattice_0.20-6 RColorBrewer_1.0-5 RCurl_1.91-1 [13] RSQLite_0.11.1 splines_2.15.0 statmod_1.4.15 stats4_2.15.0 stringr_0.6 survival_2.36-12 [19] tools_2.15.0 XML_3.9-4 xtable_1.7-0 [[alternative HTML version deleted]]
ADD COMMENTlink modified 5.2 years ago by Stefan Dentro10 • written 5.2 years ago by Stefan Dentro70
0
gravatar for Stefan Dentro
5.2 years ago by
Stefan Dentro10 wrote:
Hello, I'm using DEXSeq to obtain differentially expressed exons between my 15 samples and 8 controls. But for some reason most exons are not testable and are not assigned a p-value. Do you know what I'm doing wrong? First I've created a conservative list of exons per gene in the human genome through the canonical transcripts table in UCSC (194511 in total). Next I've obtained read counts for each exon in each sample which are read into one big data.frame. Then the following list of commands are executed: metadata = data.frame( row.names=colnames(dat)[7:29], condition=c(rep("control", 8), rep("data", 15)), replicate=c(c(1:8), c(1:15)), libType=rep("paired-end", 23)) ## Add strand information that is not available in the read counts data.frame dat = cbind(dat, strand=rep(NA, nrow(dat))) cds = newExonCountSet(countData = dat[,c(7:29)], design = metadata, geneIDs = dat[,5], exonIDs = dat[,6], exonIntervals=dat[,c(2,3,4, 30)]) cds = estimateSizeFactors(cds) cds = estimateDispersions(cds, minCount=2, maxExon=90) cds = fitDispersionFunction(cds) cds = testForDEU(cds) cds = estimatelog2FoldChanges(cds) Now lets see how many exons are expressed significantly different between data and controls. Surprisingly only 1850 exons are described here, not the full 194511: res1 = DEUresultTable(cds) table(res1$padjust < 0.05) FALSE TRUE 1424 426 So I've zoomed in to an example gene. Below it can be seen that indeed all exons are marked as not testable. But looking at the individual read counts per sample (see further below) it does not make sense that they are not testable. Some exons indeed have a low read count and are excluded rightfully, but others have enough reads to base dispersion on I would think. If I have understood correctly testable can be false in either of these cases: - Gene has only one exon, dispersion cannot be calculated. - Readcounts for an exon are too low. - A combination of the above. But all do not apply here. Any ideas? fData for the gene: geneID exonID testable dispBeforeSharing dispFitted dispersion pvalue padjust chr start end strand transcripts 136687 ENSG00000009780 1 FALSE NA 0.2026585 0.2026585 NA NA 1 28052568 28052672 <na> <na> 108973 ENSG00000009780 2 FALSE NA 0.9549670 0.9549670 NA NA 1 28053983 28054047 <na> <na> 1100 ENSG00000009780 3 FALSE NA 1.9560912 1.9560912 NA NA 1 28056743 28056844 <na> <na> 1 ENSG00000009780 4 FALSE NA 0.4722995 0.4722995 NA NA 1 28059114 28059168 <na> <na> 22096 ENSG00000009780 5 FALSE NA 0.1146813 0.1146813 NA NA 1 28060542 28060694 <na> <na> 13670 ENSG00000009780 6 FALSE NA 0.1127563 0.1127563 NA NA 1 28071165 28071322 <na> <na> 13666 ENSG00000009780 7 FALSE NA 0.1450013 0.1450013 NA NA 1 28075579 28075665 <na> <na> 22095 ENSG00000009780 8 FALSE NA 0.1160550 0.1160550 NA NA 1 28081706 28081841 <na> <na> 13665 ENSG00000009780 9 FALSE NA 0.1129432 0.1129432 NA NA 1 28086037 28086138 <na> <na> 138629 ENSG00000009780 10 FALSE NA 0.1112855 0.1112855 NA NA 1 28087006 28087092 <na> <na> Read counts for the gene per sample: exon_id chr start end gene_id exon_nr sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 1 ENSE00000327880 1 28059114 28059168 ENSG00000009780 4 11 8 4 7 3 1 8 3 1100 ENSE00000571221 1 28056743 28056844 ENSG00000009780 3 1 0 1 1 0 0 0 1 13665 ENSE00000761751 1 28086037 28086138 ENSG00000009780 9 91 145 86 47 169 36 231 136 13666 ENSE00000761753 1 28075579 28075665 ENSG00000009780 7 19 62 32 18 44 23 61 38 13670 ENSE00000761780 1 28071165 28071322 ENSG00000009780 6 77 139 73 66 78 30 168 88 22095 ENSE00000866714 1 28081706 28081841 ENSG00000009780 8 78 94 52 47 40 41 121 60 22096 ENSE00000866716 1 28060542 28060694 ENSG00000009780 5 36 61 50 63 67 10 129 69 108973 ENSE00001625313 1 28053983 28054047 ENSG00000009780 2 3 5 3 0 1 0 4 2 136687 ENSE00001909353 1 28052568 28052672 ENSG00000009780 1 2 12 5 0 19 0 34 10 138629 ENSE00001955917 1 28087006 28087092 ENSG00000009780 10 42 96 50 24 98 23 117 79 sample9 sample10 sample11 sample12 sample13 sample14 sample15 sample16 sample17 sample18 sample19 sample20 sample21 sample22 1 1 1 0 3 4 1 1 0 3 7 0 0 1 0 1100 2 0 0 0 0 1 1 0 0 3 1 0 0 0 13665 78 64 130 55 66 27 75 53 49 126 36 37 48 53 13666 7 20 51 6 43 8 15 19 11 26 7 15 11 5 13670 75 85 186 79 86 58 70 86 47 149 82 49 66 38 22095 64 64 149 51 91 37 67 54 58 148 54 41 64 43 22096 90 79 287 66 97 29 75 92 58 126 67 55 71 51 108973 1 0 0 1 1 0 1 1 0 4 0 0 0 0 136687 5 28 31 6 27 5 4 12 5 32 6 7 9 16 138629 119 111 233 100 129 59 141 103 70 219 92 60 74 67 sample23 strand 1 0 NA 1100 0 NA 13665 24 NA 13666 7 NA 13670 27 NA 22095 25 NA 22096 21 NA 108973 1 NA 136687 1 NA 138629 54 NA Design of the experiment: condition replicate libType sample1 control 1 paired-end sample2 control 2 paired-end sample3 control 3 paired-end sample4 control 4 paired-end sample5 control 5 paired-end sample6 control 6 paired-end sample7 control 7 paired-end sample8 control 8 paired-end sample9 data 1 paired-end sample10 data 2 paired-end sample11 data 3 paired-end sample12 data 4 paired-end sample13 data 5 paired-end sample14 data 6 paired-end sample15 data 7 paired-end sample16 data 8 paired-end sample17 data 9 paired-end sample18 data 10 paired-end sample19 data 11 paired-end sample20 data 12 paired-end sample21 data 13 paired-end sample22 data 14 paired-end sample23 data 15 paired-end > sessionInfo() R version 2.15.0 (2012-03-30) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 [6] LC_MESSAGES=C LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] plyr_1.7.1 DEXSeq_1.2.1 BiocInstaller_1.4.3 DESeq_1.8.3 locfit_1.5-7 Biobase_2.16.0 [7] BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] annotate_1.34.0 AnnotationDbi_1.18.0 biomaRt_2.12.0 DBI_0.2-5 genefilter_1.38.0 geneplotter_1.34.0 [7] grid_2.15.0 hwriter_1.3 IRanges_1.14.2 lattice_0.20-6 RColorBrewer_1.0-5 RCurl_1.91-1 [13] RSQLite_0.11.1 splines_2.15.0 statmod_1.4.15 stats4_2.15.0 stringr_0.6 survival_2.36-12 [19] tools_2.15.0 XML_3.9-4 xtable_1.7-0 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
ADD COMMENTlink written 5.2 years ago by Stefan Dentro10
0
gravatar for Alejandro Reyes
5.2 years ago by
Alejandro Reyes1.5k
Dana-Farber Cancer Institute, Boston, USA
Alejandro Reyes1.5k wrote:
Dear Stefan Dentro, Thanks for your email. It actually looks strange, could you send the output of the function "counts", and subset for this specific gene? Could you also do it for the "fData" function?? Alejandro > Hello, > > I'm using DEXSeq to obtain differentially expressed exons between my 15 > samples and 8 controls. But for some reason most exons are not testable and > are not assigned a p-value. Do you know what I'm doing wrong? > > First I've created a conservative list of exons per gene in the human > genome through the canonical transcripts table in UCSC (194511 in total). > Next I've obtained read counts for each exon in each sample which are read > into one big data.frame. Then the following list of commands are executed: > > metadata = data.frame( > row.names=colnames(dat)[7:29], > condition=c(rep("control", 8), rep("data", 15)), > replicate=c(c(1:8), c(1:15)), > libType=rep("paired-end", 23)) > > ## Add strand information that is not available in the read counts > data.frame > dat = cbind(dat, strand=rep(NA, nrow(dat))) > > cds = newExonCountSet(countData = dat[,c(7:29)], design = metadata, > geneIDs = dat[,5], exonIDs = dat[,6], > exonIntervals=dat[,c(2,3,4, 30)]) > > cds = estimateSizeFactors(cds) > cds = estimateDispersions(cds, minCount=2, maxExon=90) > cds = fitDispersionFunction(cds) > cds = testForDEU(cds) > cds = estimatelog2FoldChanges(cds) > > Now lets see how many exons are expressed significantly different between > data and controls. Surprisingly only 1850 exons are described here, not the > full 194511: > > res1 = DEUresultTable(cds) > table(res1$padjust < 0.05) > > FALSE TRUE > 1424 426 > > So I've zoomed in to an example gene. Below it can be seen that indeed all > exons are marked as not testable. But looking at the individual read counts > per sample (see further below) it does not make sense that they are not > testable. Some exons indeed have a low read count and are excluded > rightfully, but others have enough reads to base dispersion on I would > think. > > If I have understood correctly testable can be false in either of these > cases: > - Gene has only one exon, dispersion cannot be calculated. > - Readcounts for an exon are too low. > - A combination of the above. > > But all do not apply here. Any ideas? > > > fData for the gene: > geneID exonID testable dispBeforeSharing dispFitted > dispersion pvalue padjust chr start end strand transcripts > 136687 ENSG00000009780 1 FALSE NA 0.2026585 > 0.2026585 NA NA 1 28052568 28052672 <na> <na> > 108973 ENSG00000009780 2 FALSE NA 0.9549670 > 0.9549670 NA NA 1 28053983 28054047 <na> <na> > 1100 ENSG00000009780 3 FALSE NA 1.9560912 > 1.9560912 NA NA 1 28056743 28056844 <na> <na> > 1 ENSG00000009780 4 FALSE NA 0.4722995 > 0.4722995 NA NA 1 28059114 28059168 <na> <na> > 22096 ENSG00000009780 5 FALSE NA 0.1146813 > 0.1146813 NA NA 1 28060542 28060694 <na> <na> > 13670 ENSG00000009780 6 FALSE NA 0.1127563 > 0.1127563 NA NA 1 28071165 28071322 <na> <na> > 13666 ENSG00000009780 7 FALSE NA 0.1450013 > 0.1450013 NA NA 1 28075579 28075665 <na> <na> > 22095 ENSG00000009780 8 FALSE NA 0.1160550 > 0.1160550 NA NA 1 28081706 28081841 <na> <na> > 13665 ENSG00000009780 9 FALSE NA 0.1129432 > 0.1129432 NA NA 1 28086037 28086138 <na> <na> > 138629 ENSG00000009780 10 FALSE NA 0.1112855 > 0.1112855 NA NA 1 28087006 28087092 <na> <na> > > Read counts for the gene per sample: > exon_id chr start end gene_id exon_nr > sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 > 1 ENSE00000327880 1 28059114 28059168 ENSG00000009780 4 > 11 8 4 7 3 1 8 3 > 1100 ENSE00000571221 1 28056743 28056844 ENSG00000009780 3 > 1 0 1 1 0 0 0 1 > 13665 ENSE00000761751 1 28086037 28086138 ENSG00000009780 9 91 > 145 86 47 169 36 231 136 > 13666 ENSE00000761753 1 28075579 28075665 ENSG00000009780 7 19 > 62 32 18 44 23 61 38 > 13670 ENSE00000761780 1 28071165 28071322 ENSG00000009780 6 77 > 139 73 66 78 30 168 88 > 22095 ENSE00000866714 1 28081706 28081841 ENSG00000009780 8 78 > 94 52 47 40 41 121 60 > 22096 ENSE00000866716 1 28060542 28060694 ENSG00000009780 5 36 > 61 50 63 67 10 129 69 > 108973 ENSE00001625313 1 28053983 28054047 ENSG00000009780 2 > 3 5 3 0 1 0 4 2 > 136687 ENSE00001909353 1 28052568 28052672 ENSG00000009780 1 2 > 12 5 0 19 0 34 10 > 138629 ENSE00001955917 1 28087006 28087092 ENSG00000009780 10 > 42 96 50 24 98 23 117 79 > sample9 sample10 sample11 sample12 sample13 sample14 sample15 > sample16 sample17 sample18 sample19 sample20 sample21 sample22 > 1 1 1 0 3 4 1 1 > 0 3 7 0 0 1 0 > 1100 2 0 0 0 0 1 1 > 0 0 3 1 0 0 0 > 13665 78 64 130 55 66 27 75 > 53 49 126 36 37 48 53 > 13666 7 20 51 6 43 8 15 > 19 11 26 7 15 11 5 > 13670 75 85 186 79 86 58 70 > 86 47 149 82 49 66 38 > 22095 64 64 149 51 91 37 67 > 54 58 148 54 41 64 43 > 22096 90 79 287 66 97 29 75 > 92 58 126 67 55 71 51 > 108973 1 0 0 1 1 0 1 > 1 0 4 0 0 0 0 > 136687 5 28 31 6 27 5 4 > 12 5 32 6 7 9 16 > 138629 119 111 233 100 129 59 141 > 103 70 219 92 60 74 67 > sample23 strand > 1 0 NA > 1100 0 NA > 13665 24 NA > 13666 7 NA > 13670 27 NA > 22095 25 NA > 22096 21 NA > 108973 1 NA > 136687 1 NA > 138629 54 NA > > Design of the experiment: > condition replicate libType > sample1 control 1 paired-end > sample2 control 2 paired-end > sample3 control 3 paired-end > sample4 control 4 paired-end > sample5 control 5 paired-end > sample6 control 6 paired-end > sample7 control 7 paired-end > sample8 control 8 paired-end > sample9 data 1 paired-end > sample10 data 2 paired-end > sample11 data 3 paired-end > sample12 data 4 paired-end > sample13 data 5 paired-end > sample14 data 6 paired-end > sample15 data 7 paired-end > sample16 data 8 paired-end > sample17 data 9 paired-end > sample18 data 10 paired-end > sample19 data 11 paired-end > sample20 data 12 paired-end > sample21 data 13 paired-end > sample22 data 14 paired-end > sample23 data 15 paired-end > >> sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: i686-pc-linux-gnu (32-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 > LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 > [6] LC_MESSAGES=C LC_PAPER=C LC_NAME=C > LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] plyr_1.7.1 DEXSeq_1.2.1 BiocInstaller_1.4.3 > DESeq_1.8.3 locfit_1.5-7 Biobase_2.16.0 > [7] BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] annotate_1.34.0 AnnotationDbi_1.18.0 biomaRt_2.12.0 > DBI_0.2-5 genefilter_1.38.0 geneplotter_1.34.0 > [7] grid_2.15.0 hwriter_1.3 IRanges_1.14.2 > lattice_0.20-6 RColorBrewer_1.0-5 RCurl_1.91-1 > [13] RSQLite_0.11.1 splines_2.15.0 statmod_1.4.15 > stats4_2.15.0 stringr_0.6 survival_2.36-12 > [19] tools_2.15.0 XML_3.9-4 xtable_1.7-0 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 5.2 years ago by Alejandro Reyes1.5k
Hi Alejandro, I just reran the code with only ENSG00000009780 in the dataset. The odd thing is that this time it worked perfectly. Regards, Stefan ## Only ENSG00000009780 > counts(cds) sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 sample11 sample12 sample13 sample14 1 11 8 4 7 3 1 8 3 1 1 0 3 4 1 1100 1 0 1 1 0 0 0 1 2 0 0 0 0 1 13665 91 145 86 47 169 36 231 136 78 64 130 55 66 27 13666 19 62 32 18 44 23 61 38 7 20 51 6 43 8 13670 77 139 73 66 78 30 168 88 75 85 186 79 86 58 22095 78 94 52 47 40 41 121 60 64 64 149 51 91 37 22096 36 61 50 63 67 10 129 69 90 79 287 66 97 29 108973 3 5 3 0 1 0 4 2 1 0 0 1 1 0 136687 2 12 5 0 19 0 34 10 5 28 31 6 27 5 138629 42 96 50 24 98 23 117 79 119 111 233 100 129 59 sample15 sample16 sample17 sample18 sample19 sample20 sample21 sample22 sample23 1 1 0 3 7 0 0 1 0 0 1100 1 0 0 3 1 0 0 0 0 13665 75 53 49 126 36 37 48 53 24 13666 15 19 11 26 7 15 11 5 7 13670 70 86 47 149 82 49 66 38 27 22095 67 54 58 148 54 41 64 43 25 22096 75 92 58 126 67 55 71 51 21 108973 1 1 0 4 0 0 0 0 1 136687 4 12 5 32 6 7 9 16 1 138629 141 103 70 219 92 60 74 67 54 > fData(cds) geneID exonID testable dispBeforeSharing dispFitted dispersion pvalue padjust chr start end 1 ENSG00000009780 4 TRUE 0.40135710 0.40580351 0.40580351 4.256220e-04 8.512441e-04 1 28059114 28059168 1100 ENSG00000009780 3 TRUE 0.57127358 1.98755125 1.98755125 9.715722e-01 9.715722e-01 1 28056743 28056844 13665 ENSG00000009780 9 TRUE 0.01991574 0.03806432 0.03806432 2.537419e-09 1.268709e-08 1 28086037 28086138 13666 ENSG00000009780 7 TRUE 0.09257216 0.07072531 0.09257216 6.740254e-07 2.246751e-06 1 28075579 28075665 13670 ENSG00000009780 6 TRUE 0.01517262 0.03752086 0.03752086 7.138709e-01 7.931899e-01 1 28071165 28071322 22095 ENSG00000009780 8 TRUE 0.03299040 0.04028325 0.04028325 5.589838e-01 6.987297e-01 1 28081706 28081841 22096 ENSG00000009780 5 TRUE 0.04659993 0.03930375 0.04659993 1.632722e-04 4.081804e-04 1 28060542 28060694 108973 ENSG00000009780 2 TRUE 0.01382188 0.99934256 0.99934256 6.159293e-02 1.026549e-01 1 28053983 28054047 136687 ENSG00000009780 1 TRUE 0.38138204 0.12609379 0.38138204 1.302911e-01 1.861302e-01 1 28052568 28052672 138629 ENSG00000009780 10 TRUE 0.01782266 0.03578012 0.03578012 7.407408e-12 7.407408e-11 1 28087006 28087092 strand transcripts log2fold(data/control) 1 <na> <na> -1.89528866 1100 <na> <na> -0.01821944 13665 <na> <na> -0.79494559 13666 <na> <na> -1.13738728 13670 <na> <na> -0.06639675 22095 <na> <na> 0.07349758 22096 <na> <na> 0.52608193 108973 <na> <na> -1.59373125 136687 <na> <na> 0.56880558 138629 <na> <na> 0.85158758 On Wed, Sep 19, 2012 at 5:20 PM, Alejandro Reyes <alejandro.reyes@embl.de>wrote: > Dear Stefan Dentro, > > Thanks for your email. It actually looks strange, could you send the > output of the function "counts", and subset for this specific gene? Could > you also do it for the "fData" function?? > > Alejandro > > > Hello, >> >> I'm using DEXSeq to obtain differentially expressed exons between my 15 >> samples and 8 controls. But for some reason most exons are not testable >> and >> are not assigned a p-value. Do you know what I'm doing wrong? >> >> First I've created a conservative list of exons per gene in the human >> genome through the canonical transcripts table in UCSC (194511 in total). >> Next I've obtained read counts for each exon in each sample which are read >> into one big data.frame. Then the following list of commands are executed: >> >> metadata = data.frame( >> row.names=colnames(dat)[7:29], >> condition=c(rep("control", 8), rep("data", 15)), >> replicate=c(c(1:8), c(1:15)), >> libType=rep("paired-end", 23)) >> >> ## Add strand information that is not available in the read counts >> data.frame >> dat = cbind(dat, strand=rep(NA, nrow(dat))) >> >> cds = newExonCountSet(countData = dat[,c(7:29)], design = metadata, >> geneIDs = dat[,5], exonIDs = dat[,6], >> exonIntervals=dat[,c(2,3,4, 30)]) >> >> cds = estimateSizeFactors(cds) >> cds = estimateDispersions(cds, minCount=2, maxExon=90) >> cds = fitDispersionFunction(cds) >> cds = testForDEU(cds) >> cds = estimatelog2FoldChanges(cds) >> >> Now lets see how many exons are expressed significantly different between >> data and controls. Surprisingly only 1850 exons are described here, not >> the >> full 194511: >> >> res1 = DEUresultTable(cds) >> table(res1$padjust < 0.05) >> >> FALSE TRUE >> 1424 426 >> >> So I've zoomed in to an example gene. Below it can be seen that indeed all >> exons are marked as not testable. But looking at the individual read >> counts >> per sample (see further below) it does not make sense that they are not >> testable. Some exons indeed have a low read count and are excluded >> rightfully, but others have enough reads to base dispersion on I would >> think. >> >> If I have understood correctly testable can be false in either of these >> cases: >> - Gene has only one exon, dispersion cannot be calculated. >> - Readcounts for an exon are too low. >> - A combination of the above. >> >> But all do not apply here. Any ideas? >> >> >> fData for the gene: >> geneID exonID testable dispBeforeSharing dispFitted >> dispersion pvalue padjust chr start end strand transcripts >> 136687 ENSG00000009780 1 FALSE NA 0.2026585 >> 0.2026585 NA NA 1 28052568 28052672 <na> <na> >> 108973 ENSG00000009780 2 FALSE NA 0.9549670 >> 0.9549670 NA NA 1 28053983 28054047 <na> <na> >> 1100 ENSG00000009780 3 FALSE NA 1.9560912 >> 1.9560912 NA NA 1 28056743 28056844 <na> <na> >> 1 ENSG00000009780 4 FALSE NA 0.4722995 >> 0.4722995 NA NA 1 28059114 28059168 <na> <na> >> 22096 ENSG00000009780 5 FALSE NA 0.1146813 >> 0.1146813 NA NA 1 28060542 28060694 <na> <na> >> 13670 ENSG00000009780 6 FALSE NA 0.1127563 >> 0.1127563 NA NA 1 28071165 28071322 <na> <na> >> 13666 ENSG00000009780 7 FALSE NA 0.1450013 >> 0.1450013 NA NA 1 28075579 28075665 <na> <na> >> 22095 ENSG00000009780 8 FALSE NA 0.1160550 >> 0.1160550 NA NA 1 28081706 28081841 <na> <na> >> 13665 ENSG00000009780 9 FALSE NA 0.1129432 >> 0.1129432 NA NA 1 28086037 28086138 <na> <na> >> 138629 ENSG00000009780 10 FALSE NA 0.1112855 >> 0.1112855 NA NA 1 28087006 28087092 <na> <na> >> >> Read counts for the gene per sample: >> exon_id chr start end gene_id exon_nr >> sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 >> 1 ENSE00000327880 1 28059114 28059168 ENSG00000009780 4 >> 11 8 4 7 3 1 8 3 >> 1100 ENSE00000571221 1 28056743 28056844 ENSG00000009780 3 >> 1 0 1 1 0 0 0 1 >> 13665 ENSE00000761751 1 28086037 28086138 ENSG00000009780 9 91 >> 145 86 47 169 36 231 136 >> 13666 ENSE00000761753 1 28075579 28075665 ENSG00000009780 7 19 >> 62 32 18 44 23 61 38 >> 13670 ENSE00000761780 1 28071165 28071322 ENSG00000009780 6 77 >> 139 73 66 78 30 168 88 >> 22095 ENSE00000866714 1 28081706 28081841 ENSG00000009780 8 78 >> 94 52 47 40 41 121 60 >> 22096 ENSE00000866716 1 28060542 28060694 ENSG00000009780 5 36 >> 61 50 63 67 10 129 69 >> 108973 ENSE00001625313 1 28053983 28054047 ENSG00000009780 2 >> 3 5 3 0 1 0 4 2 >> 136687 ENSE00001909353 1 28052568 28052672 ENSG00000009780 1 2 >> 12 5 0 19 0 34 10 >> 138629 ENSE00001955917 1 28087006 28087092 ENSG00000009780 10 >> 42 96 50 24 98 23 117 79 >> sample9 sample10 sample11 sample12 sample13 sample14 sample15 >> sample16 sample17 sample18 sample19 sample20 sample21 sample22 >> 1 1 1 0 3 4 1 1 >> 0 3 7 0 0 1 0 >> 1100 2 0 0 0 0 1 1 >> 0 0 3 1 0 0 0 >> 13665 78 64 130 55 66 27 75 >> 53 49 126 36 37 48 53 >> 13666 7 20 51 6 43 8 15 >> 19 11 26 7 15 11 5 >> 13670 75 85 186 79 86 58 70 >> 86 47 149 82 49 66 38 >> 22095 64 64 149 51 91 37 67 >> 54 58 148 54 41 64 43 >> 22096 90 79 287 66 97 29 75 >> 92 58 126 67 55 71 51 >> 108973 1 0 0 1 1 0 1 >> 1 0 4 0 0 0 0 >> 136687 5 28 31 6 27 5 4 >> 12 5 32 6 7 9 16 >> 138629 119 111 233 100 129 59 141 >> 103 70 219 92 60 74 67 >> sample23 strand >> 1 0 NA >> 1100 0 NA >> 13665 24 NA >> 13666 7 NA >> 13670 27 NA >> 22095 25 NA >> 22096 21 NA >> 108973 1 NA >> 136687 1 NA >> 138629 54 NA >> >> Design of the experiment: >> condition replicate libType >> sample1 control 1 paired-end >> sample2 control 2 paired-end >> sample3 control 3 paired-end >> sample4 control 4 paired-end >> sample5 control 5 paired-end >> sample6 control 6 paired-end >> sample7 control 7 paired-end >> sample8 control 8 paired-end >> sample9 data 1 paired-end >> sample10 data 2 paired-end >> sample11 data 3 paired-end >> sample12 data 4 paired-end >> sample13 data 5 paired-end >> sample14 data 6 paired-end >> sample15 data 7 paired-end >> sample16 data 8 paired-end >> sample17 data 9 paired-end >> sample18 data 10 paired-end >> sample19 data 11 paired-end >> sample20 data 12 paired-end >> sample21 data 13 paired-end >> sample22 data 14 paired-end >> sample23 data 15 paired-end >> >> sessionInfo() >>> >> R version 2.15.0 (2012-03-30) >> Platform: i686-pc-linux-gnu (32-bit) >> >> locale: >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 >> LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 >> [6] LC_MESSAGES=C LC_PAPER=C LC_NAME=C >> LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] plyr_1.7.1 DEXSeq_1.2.1 BiocInstaller_1.4.3 >> DESeq_1.8.3 locfit_1.5-7 Biobase_2.16.0 >> [7] BiocGenerics_0.2.0 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.34.0 AnnotationDbi_1.18.0 biomaRt_2.12.0 >> DBI_0.2-5 genefilter_1.38.0 geneplotter_1.34.0 >> [7] grid_2.15.0 hwriter_1.3 IRanges_1.14.2 >> lattice_0.20-6 RColorBrewer_1.0-5 RCurl_1.91-1 >> [13] RSQLite_0.11.1 splines_2.15.0 statmod_1.4.15 >> stats4_2.15.0 stringr_0.6 survival_2.36-12 >> [19] tools_2.15.0 XML_3.9-4 xtable_1.7-0 >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > [[alternative HTML version deleted]]
ADD REPLYlink written 5.2 years ago by Stefan Dentro70
Perhaps a clue: If I add a second gene to the small dataset I get the following error message: > genes[1] "ENSG00000009780" "ENSG00000034533" ... > cds = estimateDispersions(cds, minCount=2, maxExon=90) Dispersion estimation. (Progress report: one dot per 100 genes)Error in which(sapply(muhats, inherits, "try-error")) : argument to 'which' is not logical If I select two other genes, no error message, but one of the two genes is still marked as not testable, whilst there is enough data: > genes[1] "ENSG00000069345" "ENSG00000099889" ... > fData(cds) geneID exonID testable dispBeforeSharing dispFitted dispersion pvalue padjust chr start end 3 ENSG00000069345 4 FALSE NA 0.03745547 0.03745547 NA NA 16 47001996 47002076 4 ENSG00000099889 16 TRUE 0.106713856 0.12728475 0.12728475 9.533455e-01 9.533455e-01 22 19960260 19960350 1381 ENSG00000099889 18 TRUE 0.039759265 0.11401613 0.11401613 8.020906e-05 3.809930e-04 22 19959409 19959494 1382 ENSG00000099889 15 TRUE 0.015147460 0.07255594 0.07255594 4.753684e-01 6.021333e-01 22 19960448 19960559 1383 ENSG00000099889 14 TRUE 0.020525216 0.05938316 0.05938316 5.829357e-03 1.107578e-02 22 19960642 19960840 1384 ENSG00000099889 13 TRUE 0.008016871 0.06746289 0.06746289 3.700721e-01 5.408746e-01 22 19961166 19961316 1385 ENSG00000099889 12 TRUE 0.030236839 0.07790661 0.07790661 1.964000e-01 3.109666e-01 22 19961635 19961762 1386 ENSG00000099889 9 TRUE 0.025364371 0.05520343 0.05520343 4.359563e-01 5.916549e-01 22 19964938 19965109 1387 ENSG00000099889 8 TRUE 0.064383214 0.07631387 0.07631387 2.412871e-09 2.292227e-08 22 19965481 19965598 6388 ENSG00000069345 8 FALSE NA 0.02990952 0.02990952 NA NA 16 46992915 46993042 6389 ENSG00000069345 7 FALSE NA 0.02943529 0.02943529 NA NA 16 46993187 46993331 6390 ENSG00000069345 6 FALSE NA 0.02756900 0.02756900 NA NA 16 46998523 46998719 6391 ENSG00000069345 5 FALSE NA 0.02919581 0.02919581 NA NA 16 47001425 47001558 6393 ENSG00000069345 3 FALSE NA 0.02784982 0.02784982 NA NA 16 47005261 47005484 6394 ENSG00000069345 2 FALSE NA 0.06403618 0.06403618 NA NA 16 47005808 47005867 23101 ENSG00000099889 11 TRUE 0.176647935 0.16264956 0.17664794 0.000000e+00 0.000000e+00 22 19963209 19963280 76702 ENSG00000099889 17 TRUE 0.042072149 0.55244196 0.55244196 8.569994e-07 5.427663e-06 22 19959881 19959934 79101 ENSG00000099889 2 FALSE NA 40.14086829 40.14086829 NA NA 22 19997978 19998031 79190 ENSG00000099889 19 TRUE 0.347541116 0.09043853 0.34754112 4.506187e-03 9.513061e-03 22 19958739 19958858 79566 ENSG00000069345 9 FALSE NA 0.02639237 0.02639237 NA NA 16 46989299 46991132 111562 ENSG00000099889 6 TRUE 0.036879156 0.03886331 0.03886331 2.370001e-04 6.432859e-04 22 19967266 19967765 122467 ENSG00000099889 5 TRUE 0.046550209 0.04180913 0.04655021 5.771771e-01 6.853978e-01 22 19968734 19969260 126533 ENSG00000099889 7 TRUE 0.023668263 0.05572503 0.05572503 1.439509e-04 5.470135e-04 22 19966420 19966603 127478 ENSG00000099889 10 TRUE 0.482851293 1.21253452 1.21253452 3.875210e-02 6.693545e-02 22 19964229 19964246 131502 ENSG00000099889 4 TRUE 0.064691260 0.07631174 0.07631174 2.174826e-04 6.432859e-04 22 19969456 19969614 135581 ENSG00000099889 20 TRUE 0.059913261 0.03024754 0.05991326 8.767254e-01 9.254324e-01 22 19957419 19958266 137473 ENSG00000099889 1 TRUE 0.623260704 0.39612292 0.62326070 8.554939e-01 9.254324e-01 22 20004112 20004331 147158 ENSG00000099889 3 TRUE 0.546816624 0.16681316 0.54681662 3.335978e-03 7.922947e-03 22 19978108 19978335 186433 ENSG00000069345 1 FALSE NA 0.03292932 0.03292932 NA NA 16 47007406 47007699 strand transcripts log2fold(data/control) 3 <na> <na> NA 4 <na> <na> -0.03257363 1381 <na> <na> -0.99913247 1382 <na> <na> -0.16326931 1383 <na> <na> 0.46834227 1384 <na> <na> -0.19311802 1385 <na> <na> 0.24176762 1386 <na> <na> -0.14237513 1387 <na> <na> -1.20469634 6388 <na> <na> NA 6389 <na> <na> NA 6390 <na> <na> NA 6391 <na> <na> NA 6393 <na> <na> NA 6394 <na> <na> NA 23101 <na> <na> -2.68320588 76702 <na> <na> -2.91182540 79101 <na> <na> 21.32540529 79190 <na> <na> 1.08400450 79566 <na> <na> NA 111562 <na> <na> 0.46147010 122467 <na> <na> 0.05583688 126533 <na> <na> 0.64272965 127478 <na> <na> -1.83381112 131502 <na> <na> 0.73838679 135581 <na> <na> -0.04311782 137473 <na> <na> 0.17841783 147158 <na> <na> 1.62180140 186433 <na> <na> NA > counts(cds) sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 sample11 sample12 sample13 sample14 3 247 161 259 144 186 191 226 147 107 200 302 69 133 62 4 14 10 9 15 7 7 38 6 23 20 57 23 24 6 1381 24 42 23 8 24 15 41 14 18 22 102 15 40 5 1382 36 44 31 17 30 13 65 13 34 65 148 40 57 13 1383 29 40 33 5 25 37 73 13 53 115 296 46 93 19 1384 24 41 31 9 33 38 99 20 40 88 165 40 95 10 1385 9 31 29 21 20 9 61 7 42 63 132 48 77 10 1386 28 45 60 29 37 36 120 38 73 82 231 69 103 24 1387 15 81 42 6 43 43 122 30 32 51 91 34 38 6 6388 434 528 611 331 414 414 732 415 427 455 484 143 445 197 6389 505 828 1061 402 692 830 1096 633 396 472 583 174 415 200 6390 1008 1213 1696 991 959 1093 1533 884 1254 1325 3025 637 1164 672 6391 439 780 1151 437 692 612 1151 700 459 654 1134 241 435 222 6393 737 918 1301 683 786 940 1377 773 1069 1541 2603 613 1211 512 6394 49 64 145 32 106 99 159 64 47 35 41 15 46 6 23101 25 26 27 5 18 11 66 30 11 6 21 7 4 0 76702 6 10 5 0 6 1 19 9 2 0 3 2 4 0 79101 0 0 0 0 0 0 0 0 0 0 0 0 1 0 79190 23 1 8 0 9 1 62 5 44 46 160 31 41 15 79566 7133 6826 7143 6365 6600 6431 7394 4947 6323 7007 10031 1949 7597 4639 111562 50 96 65 51 107 50 184 47 136 365 675 174 235 69 122467 82 120 60 53 62 45 190 51 104 309 530 123 253 46 126533 24 49 28 19 19 21 73 22 88 131 221 96 101 25 127478 0 2 0 0 4 0 11 4 0 0 6 1 2 0 131502 24 24 22 12 6 3 60 3 33 84 208 48 85 15 135581 389 477 257 93 305 279 773 141 459 611 1830 361 766 173 137473 7 0 1 8 3 0 1 0 1 16 17 5 10 6 147158 5 3 4 0 3 6 10 0 0 52 27 8 23 21 186433 65 132 241 134 128 147 227 136 231 955 989 152 606 142 sample15 sample16 sample17 sample18 sample19 sample20 sample21 sample22 sample23 3 78 168 129 217 149 75 104 73 28 4 41 2 35 125 7 51 16 15 13 1381 14 4 27 82 4 61 20 34 7 1382 56 8 64 270 12 102 37 86 28 1383 68 25 103 393 20 159 55 105 51 1384 51 12 83 260 10 119 42 77 35 1385 40 19 82 222 7 115 41 54 17 1386 96 21 104 425 17 195 72 93 28 1387 28 11 40 163 9 88 30 63 16 6388 332 660 374 867 648 272 327 73 285 6389 253 556 353 785 471 268 292 152 205 6390 950 1687 1139 2477 1607 838 924 554 508 6391 337 646 437 1022 525 314 426 303 155 6393 815 1476 912 2385 1225 764 921 774 288 6394 15 21 28 16 10 21 11 14 8 23101 4 2 16 38 0 23 11 2 2 76702 1 0 1 6 0 3 2 2 3 79101 0 0 0 0 0 1 0 0 0 79190 26 6 85 79 8 133 37 71 29 79566 6589 10656 8383 11925 10680 3772 4360 2309 5646 111562 146 63 275 1046 42 423 186 330 56 122467 115 37 157 826 31 376 162 217 31 126533 85 19 97 421 26 215 99 98 31 127478 1 0 0 5 0 4 1 3 2 131502 35 21 56 266 13 136 56 60 16 135581 655 208 890 2407 176 953 358 577 413 137473 2 1 10 16 2 18 3 5 3 147158 14 1 16 97 8 98 17 18 6 186433 179 389 253 608 276 231 236 400 62 On Thu, Sep 20, 2012 at 9:31 AM, Stefan Dentro <sdentro@gmail.com> wrote: > Hi Alejandro, > > I just reran the code with only ENSG00000009780 in the dataset. The odd > thing is that this time it worked perfectly. > > Regards, > Stefan > > > ## Only ENSG00000009780 > > > counts(cds) sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 sample11 sample12 sample13 sample14 > 1 11 8 4 7 3 1 8 3 1 1 0 3 4 1 > 1100 1 0 1 1 0 0 0 1 2 0 0 0 0 1 > 13665 91 145 86 47 169 36 231 136 78 64 130 55 66 27 > 13666 19 62 32 18 44 23 61 38 7 20 51 6 43 8 > 13670 77 139 73 66 78 30 168 88 75 85 186 79 86 58 > 22095 78 94 52 47 40 41 121 60 64 64 149 51 91 37 > 22096 36 61 50 63 67 10 129 69 90 79 287 66 97 29 > 108973 3 5 3 0 1 0 4 2 1 0 0 1 1 0 > 136687 2 12 5 0 19 0 34 10 5 28 31 6 27 5 > 138629 42 96 50 24 98 23 117 79 119 111 233 100 129 59 > sample15 sample16 sample17 sample18 sample19 sample20 sample21 sample22 sample23 > 1 1 0 3 7 0 0 1 0 0 > 1100 1 0 0 3 1 0 0 0 0 > 13665 75 53 49 126 36 37 48 53 24 > 13666 15 19 11 26 7 15 11 5 7 > 13670 70 86 47 149 82 49 66 38 27 > 22095 67 54 58 148 54 41 64 43 25 > 22096 75 92 58 126 67 55 71 51 21 > 108973 1 1 0 4 0 0 0 0 1 > 136687 4 12 5 32 6 7 9 16 1 > 138629 141 103 70 219 92 60 74 67 54 > > > > fData(cds) geneID exonID testable dispBeforeSharing dispFitted dispersion pvalue padjust chr start end > 1 ENSG00000009780 4 TRUE 0.40135710 0.40580351 0.40580351 4.256220e-04 8.512441e-04 1 28059114 28059168 > 1100 ENSG00000009780 3 TRUE 0.57127358 1.98755125 1.98755125 9.715722e-01 9.715722e-01 1 28056743 28056844 > 13665 ENSG00000009780 9 TRUE 0.01991574 0.03806432 0.03806432 2.537419e-09 1.268709e-08 1 28086037 28086138 > 13666 ENSG00000009780 7 TRUE 0.09257216 0.07072531 0.09257216 6.740254e-07 2.246751e-06 1 28075579 28075665 > 13670 ENSG00000009780 6 TRUE 0.01517262 0.03752086 0.03752086 7.138709e-01 7.931899e-01 1 28071165 28071322 > 22095 ENSG00000009780 8 TRUE 0.03299040 0.04028325 0.04028325 5.589838e-01 6.987297e-01 1 28081706 28081841 > 22096 ENSG00000009780 5 TRUE 0.04659993 0.03930375 0.04659993 1.632722e-04 4.081804e-04 1 28060542 28060694 > 108973 ENSG00000009780 2 TRUE 0.01382188 0.99934256 0.99934256 6.159293e-02 1.026549e-01 1 28053983 28054047 > 136687 ENSG00000009780 1 TRUE 0.38138204 0.12609379 0.38138204 1.302911e-01 1.861302e-01 1 28052568 28052672 > 138629 ENSG00000009780 10 TRUE 0.01782266 0.03578012 0.03578012 7.407408e-12 7.407408e-11 1 28087006 28087092 > strand transcripts log2fold(data/control) > 1 <na> <na> -1.89528866 > 1100 <na> <na> -0.01821944 > 13665 <na> <na> -0.79494559 > 13666 <na> <na> -1.13738728 > 13670 <na> <na> -0.06639675 > 22095 <na> <na> 0.07349758 > 22096 <na> <na> 0.52608193 > 108973 <na> <na> -1.59373125 > 136687 <na> <na> 0.56880558 > 138629 <na> <na> 0.85158758 > > > > On Wed, Sep 19, 2012 at 5:20 PM, Alejandro Reyes <alejandro.reyes@embl.de>wrote: > >> Dear Stefan Dentro, >> >> Thanks for your email. It actually looks strange, could you send the >> output of the function "counts", and subset for this specific gene? Could >> you also do it for the "fData" function?? >> >> Alejandro >> >> >> Hello, >>> >>> I'm using DEXSeq to obtain differentially expressed exons between my 15 >>> samples and 8 controls. But for some reason most exons are not testable >>> and >>> are not assigned a p-value. Do you know what I'm doing wrong? >>> >>> First I've created a conservative list of exons per gene in the human >>> genome through the canonical transcripts table in UCSC (194511 in total). >>> Next I've obtained read counts for each exon in each sample which are >>> read >>> into one big data.frame. Then the following list of commands are >>> executed: >>> >>> metadata = data.frame( >>> row.names=colnames(dat)[7:29], >>> condition=c(rep("control", 8), rep("data", 15)), >>> replicate=c(c(1:8), c(1:15)), >>> libType=rep("paired-end", 23)) >>> >>> ## Add strand information that is not available in the read counts >>> data.frame >>> dat = cbind(dat, strand=rep(NA, nrow(dat))) >>> >>> cds = newExonCountSet(countData = dat[,c(7:29)], design = metadata, >>> geneIDs = dat[,5], exonIDs = dat[,6], >>> exonIntervals=dat[,c(2,3,4, 30)]) >>> >>> cds = estimateSizeFactors(cds) >>> cds = estimateDispersions(cds, minCount=2, maxExon=90) >>> cds = fitDispersionFunction(cds) >>> cds = testForDEU(cds) >>> cds = estimatelog2FoldChanges(cds) >>> >>> Now lets see how many exons are expressed significantly different between >>> data and controls. Surprisingly only 1850 exons are described here, not >>> the >>> full 194511: >>> >>> res1 = DEUresultTable(cds) >>> table(res1$padjust < 0.05) >>> >>> FALSE TRUE >>> 1424 426 >>> >>> So I've zoomed in to an example gene. Below it can be seen that indeed >>> all >>> exons are marked as not testable. But looking at the individual read >>> counts >>> per sample (see further below) it does not make sense that they are not >>> testable. Some exons indeed have a low read count and are excluded >>> rightfully, but others have enough reads to base dispersion on I would >>> think. >>> >>> If I have understood correctly testable can be false in either of these >>> cases: >>> - Gene has only one exon, dispersion cannot be calculated. >>> - Readcounts for an exon are too low. >>> - A combination of the above. >>> >>> But all do not apply here. Any ideas? >>> >>> >>> fData for the gene: >>> geneID exonID testable dispBeforeSharing dispFitted >>> dispersion pvalue padjust chr start end strand transcripts >>> 136687 ENSG00000009780 1 FALSE NA 0.2026585 >>> 0.2026585 NA NA 1 28052568 28052672 <na> <na> >>> 108973 ENSG00000009780 2 FALSE NA 0.9549670 >>> 0.9549670 NA NA 1 28053983 28054047 <na> <na> >>> 1100 ENSG00000009780 3 FALSE NA 1.9560912 >>> 1.9560912 NA NA 1 28056743 28056844 <na> <na> >>> 1 ENSG00000009780 4 FALSE NA 0.4722995 >>> 0.4722995 NA NA 1 28059114 28059168 <na> <na> >>> 22096 ENSG00000009780 5 FALSE NA 0.1146813 >>> 0.1146813 NA NA 1 28060542 28060694 <na> <na> >>> 13670 ENSG00000009780 6 FALSE NA 0.1127563 >>> 0.1127563 NA NA 1 28071165 28071322 <na> <na> >>> 13666 ENSG00000009780 7 FALSE NA 0.1450013 >>> 0.1450013 NA NA 1 28075579 28075665 <na> <na> >>> 22095 ENSG00000009780 8 FALSE NA 0.1160550 >>> 0.1160550 NA NA 1 28081706 28081841 <na> <na> >>> 13665 ENSG00000009780 9 FALSE NA 0.1129432 >>> 0.1129432 NA NA 1 28086037 28086138 <na> <na> >>> 138629 ENSG00000009780 10 FALSE NA 0.1112855 >>> 0.1112855 NA NA 1 28087006 28087092 <na> <na> >>> >>> Read counts for the gene per sample: >>> exon_id chr start end gene_id exon_nr >>> sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 >>> 1 ENSE00000327880 1 28059114 28059168 ENSG00000009780 4 >>> 11 8 4 7 3 1 8 3 >>> 1100 ENSE00000571221 1 28056743 28056844 ENSG00000009780 3 >>> 1 0 1 1 0 0 0 1 >>> 13665 ENSE00000761751 1 28086037 28086138 ENSG00000009780 9 91 >>> 145 86 47 169 36 231 136 >>> 13666 ENSE00000761753 1 28075579 28075665 ENSG00000009780 7 19 >>> 62 32 18 44 23 61 38 >>> 13670 ENSE00000761780 1 28071165 28071322 ENSG00000009780 6 77 >>> 139 73 66 78 30 168 88 >>> 22095 ENSE00000866714 1 28081706 28081841 ENSG00000009780 8 78 >>> 94 52 47 40 41 121 60 >>> 22096 ENSE00000866716 1 28060542 28060694 ENSG00000009780 5 36 >>> 61 50 63 67 10 129 69 >>> 108973 ENSE00001625313 1 28053983 28054047 ENSG00000009780 2 >>> 3 5 3 0 1 0 4 2 >>> 136687 ENSE00001909353 1 28052568 28052672 ENSG00000009780 1 2 >>> 12 5 0 19 0 34 10 >>> 138629 ENSE00001955917 1 28087006 28087092 ENSG00000009780 10 >>> 42 96 50 24 98 23 117 79 >>> sample9 sample10 sample11 sample12 sample13 sample14 sample15 >>> sample16 sample17 sample18 sample19 sample20 sample21 sample22 >>> 1 1 1 0 3 4 1 1 >>> 0 3 7 0 0 1 0 >>> 1100 2 0 0 0 0 1 1 >>> 0 0 3 1 0 0 0 >>> 13665 78 64 130 55 66 27 75 >>> 53 49 126 36 37 48 53 >>> 13666 7 20 51 6 43 8 15 >>> 19 11 26 7 15 11 5 >>> 13670 75 85 186 79 86 58 70 >>> 86 47 149 82 49 66 38 >>> 22095 64 64 149 51 91 37 67 >>> 54 58 148 54 41 64 43 >>> 22096 90 79 287 66 97 29 75 >>> 92 58 126 67 55 71 51 >>> 108973 1 0 0 1 1 0 1 >>> 1 0 4 0 0 0 0 >>> 136687 5 28 31 6 27 5 4 >>> 12 5 32 6 7 9 16 >>> 138629 119 111 233 100 129 59 141 >>> 103 70 219 92 60 74 67 >>> sample23 strand >>> 1 0 NA >>> 1100 0 NA >>> 13665 24 NA >>> 13666 7 NA >>> 13670 27 NA >>> 22095 25 NA >>> 22096 21 NA >>> 108973 1 NA >>> 136687 1 NA >>> 138629 54 NA >>> >>> Design of the experiment: >>> condition replicate libType >>> sample1 control 1 paired-end >>> sample2 control 2 paired-end >>> sample3 control 3 paired-end >>> sample4 control 4 paired-end >>> sample5 control 5 paired-end >>> sample6 control 6 paired-end >>> sample7 control 7 paired-end >>> sample8 control 8 paired-end >>> sample9 data 1 paired-end >>> sample10 data 2 paired-end >>> sample11 data 3 paired-end >>> sample12 data 4 paired-end >>> sample13 data 5 paired-end >>> sample14 data 6 paired-end >>> sample15 data 7 paired-end >>> sample16 data 8 paired-end >>> sample17 data 9 paired-end >>> sample18 data 10 paired-end >>> sample19 data 11 paired-end >>> sample20 data 12 paired-end >>> sample21 data 13 paired-end >>> sample22 data 14 paired-end >>> sample23 data 15 paired-end >>> >>> sessionInfo() >>>> >>> R version 2.15.0 (2012-03-30) >>> Platform: i686-pc-linux-gnu (32-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 >>> LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 >>> [6] LC_MESSAGES=C LC_PAPER=C LC_NAME=C >>> LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] plyr_1.7.1 DEXSeq_1.2.1 BiocInstaller_1.4.3 >>> DESeq_1.8.3 locfit_1.5-7 Biobase_2.16.0 >>> [7] BiocGenerics_0.2.0 >>> >>> loaded via a namespace (and not attached): >>> [1] annotate_1.34.0 AnnotationDbi_1.18.0 biomaRt_2.12.0 >>> DBI_0.2-5 genefilter_1.38.0 geneplotter_1.34.0 >>> [7] grid_2.15.0 hwriter_1.3 IRanges_1.14.2 >>> lattice_0.20-6 RColorBrewer_1.0-5 RCurl_1.91-1 >>> [13] RSQLite_0.11.1 splines_2.15.0 statmod_1.4.15 >>> stats4_2.15.0 stringr_0.6 survival_2.36-12 >>> [19] tools_2.15.0 XML_3.9-4 xtable_1.7-0 >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >> >> > [[alternative HTML version deleted]]
ADD REPLYlink written 5.2 years ago by Stefan Dentro70
Dear Stefano Dentro, Thanks for the information. It does not seem obvious to me to determine what is the problem. Would it be possible for you to send me your ExonCountSet object to give it a closer look? Of course, it would be treated with full confidentiality and deleted after detecting/solving the error. Alejandro Reyes > Perhaps a clue: > > If I add a second gene to the small dataset I get the following error > message: > > >genes > [1] "ENSG00000009780" "ENSG00000034533" > ... > >cds = estimateDispersions(cds, minCount=2, maxExon=90) > Dispersion estimation. (Progress report: one dot per 100 genes) > Error in which(sapply(muhats, inherits, "try-error")) : > argument to 'which' is not logical > > > If I select two other genes, no error message, but one of the two > genes is still marked as not testable, whilst there is enough data: > >genes > [1] "ENSG00000069345" "ENSG00000099889" > ... > >fData(cds) > geneID exonID testable dispBeforeSharing dispFitted dispersion pvalue padjust chr start end > 3 ENSG00000069345 4 FALSE NA 0.03745547 0.03745547 NA NA 16 47001996 47002076 > 4 ENSG00000099889 16 TRUE 0.106713856 0.12728475 0.12728475 9.533455e-01 9.533455e-01 22 19960260 19960350 > 1381 ENSG00000099889 18 TRUE 0.039759265 0.11401613 0.11401613 8.020906e-05 3.809930e-04 22 19959409 19959494 > 1382 ENSG00000099889 15 TRUE 0.015147460 0.07255594 0.07255594 4.753684e-01 6.021333e-01 22 19960448 19960559 > 1383 ENSG00000099889 14 TRUE 0.020525216 0.05938316 0.05938316 5.829357e-03 1.107578e-02 22 19960642 19960840 > 1384 ENSG00000099889 13 TRUE 0.008016871 0.06746289 0.06746289 3.700721e-01 5.408746e-01 22 19961166 19961316 > 1385 ENSG00000099889 12 TRUE 0.030236839 0.07790661 0.07790661 1.964000e-01 3.109666e-01 22 19961635 19961762 > 1386 ENSG00000099889 9 TRUE 0.025364371 0.05520343 0.05520343 4.359563e-01 5.916549e-01 22 19964938 19965109 > 1387 ENSG00000099889 8 TRUE 0.064383214 0.07631387 0.07631387 2.412871e-09 2.292227e-08 22 19965481 19965598 > 6388 ENSG00000069345 8 FALSE NA 0.02990952 0.02990952 NA NA 16 46992915 46993042 > 6389 ENSG00000069345 7 FALSE NA 0.02943529 0.02943529 NA NA 16 46993187 46993331 > 6390 ENSG00000069345 6 FALSE NA 0.02756900 0.02756900 NA NA 16 46998523 46998719 > 6391 ENSG00000069345 5 FALSE NA 0.02919581 0.02919581 NA NA 16 47001425 47001558 > 6393 ENSG00000069345 3 FALSE NA 0.02784982 0.02784982 NA NA 16 47005261 47005484 > 6394 ENSG00000069345 2 FALSE NA 0.06403618 0.06403618 NA NA 16 47005808 47005867 > 23101 ENSG00000099889 11 TRUE 0.176647935 0.16264956 0.17664794 0.000000e+00 0.000000e+00 22 19963209 19963280 > 76702 ENSG00000099889 17 TRUE 0.042072149 0.55244196 0.55244196 8.569994e-07 5.427663e-06 22 19959881 19959934 > 79101 ENSG00000099889 2 FALSE NA 40.14086829 40.14086829 NA NA 22 19997978 19998031 > 79190 ENSG00000099889 19 TRUE 0.347541116 0.09043853 0.34754112 4.506187e-03 9.513061e-03 22 19958739 19958858 > 79566 ENSG00000069345 9 FALSE NA 0.02639237 0.02639237 NA NA 16 46989299 46991132 > 111562 ENSG00000099889 6 TRUE 0.036879156 0.03886331 0.03886331 2.370001e-04 6.432859e-04 22 19967266 19967765 > 122467 ENSG00000099889 5 TRUE 0.046550209 0.04180913 0.04655021 5.771771e-01 6.853978e-01 22 19968734 19969260 > 126533 ENSG00000099889 7 TRUE 0.023668263 0.05572503 0.05572503 1.439509e-04 5.470135e-04 22 19966420 19966603 > 127478 ENSG00000099889 10 TRUE 0.482851293 1.21253452 1.21253452 3.875210e-02 6.693545e-02 22 19964229 19964246 > 131502 ENSG00000099889 4 TRUE 0.064691260 0.07631174 0.07631174 2.174826e-04 6.432859e-04 22 19969456 19969614 > 135581 ENSG00000099889 20 TRUE 0.059913261 0.03024754 0.05991326 8.767254e-01 9.254324e-01 22 19957419 19958266 > 137473 ENSG00000099889 1 TRUE 0.623260704 0.39612292 0.62326070 8.554939e-01 9.254324e-01 22 20004112 20004331 > 147158 ENSG00000099889 3 TRUE 0.546816624 0.16681316 0.54681662 3.335978e-03 7.922947e-03 22 19978108 19978335 > 186433 ENSG00000069345 1 FALSE NA 0.03292932 0.03292932 NA NA 16 47007406 47007699 > strand transcripts log2fold(data/control) > 3 <na> <na> NA > 4 <na> <na> -0.03257363 > 1381 <na> <na> -0.99913247 > 1382 <na> <na> -0.16326931 > 1383 <na> <na> 0.46834227 > 1384 <na> <na> -0.19311802 > 1385 <na> <na> 0.24176762 > 1386 <na> <na> -0.14237513 > 1387 <na> <na> -1.20469634 > 6388 <na> <na> NA > 6389 <na> <na> NA > 6390 <na> <na> NA > 6391 <na> <na> NA > 6393 <na> <na> NA > 6394 <na> <na> NA > 23101 <na> <na> -2.68320588 > 76702 <na> <na> -2.91182540 > 79101 <na> <na> 21.32540529 > 79190 <na> <na> 1.08400450 > 79566 <na> <na> NA > 111562 <na> <na> 0.46147010 > 122467 <na> <na> 0.05583688 > 126533 <na> <na> 0.64272965 > 127478 <na> <na> -1.83381112 > 131502 <na> <na> 0.73838679 > 135581 <na> <na> -0.04311782 > 137473 <na> <na> 0.17841783 > 147158 <na> <na> 1.62180140 > 186433 <na> <na> NA > > >counts(cds) > sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 sample11 sample12 sample13 sample14 > 3 247 161 259 144 186 191 226 147 107 200 302 69 133 62 > 4 14 10 9 15 7 7 38 6 23 20 57 23 24 6 > 1381 24 42 23 8 24 15 41 14 18 22 102 15 40 5 > 1382 36 44 31 17 30 13 65 13 34 65 148 40 57 13 > 1383 29 40 33 5 25 37 73 13 53 115 296 46 93 19 > 1384 24 41 31 9 33 38 99 20 40 88 165 40 95 10 > 1385 9 31 29 21 20 9 61 7 42 63 132 48 77 10 > 1386 28 45 60 29 37 36 120 38 73 82 231 69 103 24 > 1387 15 81 42 6 43 43 122 30 32 51 91 34 38 6 > 6388 434 528 611 331 414 414 732 415 427 455 484 143 445 197 > 6389 505 828 1061 402 692 830 1096 633 396 472 583 174 415 200 > 6390 1008 1213 1696 991 959 1093 1533 884 1254 1325 3025 637 1164 672 > 6391 439 780 1151 437 692 612 1151 700 459 654 1134 241 435 222 > 6393 737 918 1301 683 786 940 1377 773 1069 1541 2603 613 1211 512 > 6394 49 64 145 32 106 99 159 64 47 35 41 15 46 6 > 23101 25 26 27 5 18 11 66 30 11 6 21 7 4 0 > 76702 6 10 5 0 6 1 19 9 2 0 3 2 4 0 > 79101 0 0 0 0 0 0 0 0 0 0 0 0 1 0 > 79190 23 1 8 0 9 1 62 5 44 46 160 31 41 15 > 79566 7133 6826 7143 6365 6600 6431 7394 4947 6323 7007 10031 1949 7597 4639 > 111562 50 96 65 51 107 50 184 47 136 365 675 174 235 69 > 122467 82 120 60 53 62 45 190 51 104 309 530 123 253 46 > 126533 24 49 28 19 19 21 73 22 88 131 221 96 101 25 > 127478 0 2 0 0 4 0 11 4 0 0 6 1 2 0 > 131502 24 24 22 12 6 3 60 3 33 84 208 48 85 15 > 135581 389 477 257 93 305 279 773 141 459 611 1830 361 766 173 > 137473 7 0 1 8 3 0 1 0 1 16 17 5 10 6 > 147158 5 3 4 0 3 6 10 0 0 52 27 8 23 21 > 186433 65 132 241 134 128 147 227 136 231 955 989 152 606 142 > sample15 sample16 sample17 sample18 sample19 sample20 sample21 sample22 sample23 > 3 78 168 129 217 149 75 104 73 28 > 4 41 2 35 125 7 51 16 15 13 > 1381 14 4 27 82 4 61 20 34 7 > 1382 56 8 64 270 12 102 37 86 28 > 1383 68 25 103 393 20 159 55 105 51 > 1384 51 12 83 260 10 119 42 77 35 > 1385 40 19 82 222 7 115 41 54 17 > 1386 96 21 104 425 17 195 72 93 28 > 1387 28 11 40 163 9 88 30 63 16 > 6388 332 660 374 867 648 272 327 73 285 > 6389 253 556 353 785 471 268 292 152 205 > 6390 950 1687 1139 2477 1607 838 924 554 508 > 6391 337 646 437 1022 525 314 426 303 155 > 6393 815 1476 912 2385 1225 764 921 774 288 > 6394 15 21 28 16 10 21 11 14 8 > 23101 4 2 16 38 0 23 11 2 2 > 76702 1 0 1 6 0 3 2 2 3 > 79101 0 0 0 0 0 1 0 0 0 > 79190 26 6 85 79 8 133 37 71 29 > 79566 6589 10656 8383 11925 10680 3772 4360 2309 5646 > 111562 146 63 275 1046 42 423 186 330 56 > 122467 115 37 157 826 31 376 162 217 31 > 126533 85 19 97 421 26 215 99 98 31 > 127478 1 0 0 5 0 4 1 3 2 > 131502 35 21 56 266 13 136 56 60 16 > 135581 655 208 890 2407 176 953 358 577 413 > 137473 2 1 10 16 2 18 3 5 3 > 147158 14 1 16 97 8 98 17 18 6 > 186433 179 389 253 608 276 231 236 400 62 > > On Thu, Sep 20, 2012 at 9:31 AM, Stefan Dentro <sdentro@gmail.com> <mailto:sdentro@gmail.com>> wrote: > > Hi Alejandro, > > I just reran the code with only ENSG00000009780 in the dataset. > The odd thing is that this time it worked perfectly. > > Regards, > Stefan > > > ## Only ENSG00000009780 > > >counts(cds) > sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 sample11 sample12 sample13 sample14 > 1 11 8 4 7 3 1 8 3 1 1 0 3 4 1 > 1100 1 0 1 1 0 0 0 1 2 0 0 0 0 1 > 13665 91 145 86 47 169 36 231 136 78 64 130 55 66 27 > 13666 19 62 32 18 44 23 61 38 7 20 51 6 43 8 > 13670 77 139 73 66 78 30 168 88 75 85 186 79 86 58 > 22095 78 94 52 47 40 41 121 60 64 64 149 51 91 37 > 22096 36 61 50 63 67 10 129 69 90 79 287 66 97 29 > 108973 3 5 3 0 1 0 4 2 1 0 0 1 1 0 > 136687 2 12 5 0 19 0 34 10 5 28 31 6 27 5 > 138629 42 96 50 24 98 23 117 79 119 111 233 100 129 59 > sample15 sample16 sample17 sample18 sample19 sample20 sample21 sample22 sample23 > 1 1 0 3 7 0 0 1 0 0 > 1100 1 0 0 3 1 0 0 0 0 > 13665 75 53 49 126 36 37 48 53 24 > 13666 15 19 11 26 7 15 11 5 7 > 13670 70 86 47 149 82 49 66 38 27 > 22095 67 54 58 148 54 41 64 43 25 > 22096 75 92 58 126 67 55 71 51 21 > 108973 1 1 0 4 0 0 0 0 1 > 136687 4 12 5 32 6 7 9 16 1 > 138629 141 103 70 219 92 60 74 67 54 > > > >fData(cds) > geneID exonID testable dispBeforeSharing dispFitted dispersion pvalue padjust chr start end > 1 ENSG00000009780 4 TRUE 0.40135710 0.40580351 0.40580351 4.256220e-04 8.512441e-04 1 28059114 28059168 > 1100 ENSG00000009780 3 TRUE 0.57127358 1.98755125 1.98755125 9.715722e-01 9.715722e-01 1 28056743 28056844 > 13665 ENSG00000009780 9 TRUE 0.01991574 0.03806432 0.03806432 2.537419e-09 1.268709e-08 1 28086037 28086138 > 13666 ENSG00000009780 7 TRUE 0.09257216 0.07072531 0.09257216 6.740254e-07 2.246751e-06 1 28075579 28075665 > 13670 ENSG00000009780 6 TRUE 0.01517262 0.03752086 0.03752086 7.138709e-01 7.931899e-01 1 28071165 28071322 > 22095 ENSG00000009780 8 TRUE 0.03299040 0.04028325 0.04028325 5.589838e-01 6.987297e-01 1 28081706 28081841 > 22096 ENSG00000009780 5 TRUE 0.04659993 0.03930375 0.04659993 1.632722e-04 4.081804e-04 1 28060542 28060694 > 108973 ENSG00000009780 2 TRUE 0.01382188 0.99934256 0.99934256 6.159293e-02 1.026549e-01 1 28053983 28054047 > 136687 ENSG00000009780 1 TRUE 0.38138204 0.12609379 0.38138204 1.302911e-01 1.861302e-01 1 28052568 28052672 > 138629 ENSG00000009780 10 TRUE 0.01782266 0.03578012 0.03578012 7.407408e-12 7.407408e-11 1 28087006 28087092 > strand transcripts log2fold(data/control) > 1 <na> <na> -1.89528866 > 1100 <na> <na> -0.01821944 > 13665 <na> <na> -0.79494559 > 13666 <na> <na> -1.13738728 > 13670 <na> <na> -0.06639675 > 22095 <na> <na> 0.07349758 > 22096 <na> <na> 0.52608193 > 108973 <na> <na> -1.59373125 > 136687 <na> <na> 0.56880558 > 138629 <na> <na> 0.85158758 > > > > On Wed, Sep 19, 2012 at 5:20 PM, Alejandro Reyes > <alejandro.reyes@embl.de <mailto:alejandro.reyes@embl.de="">> wrote: > > Dear Stefan Dentro, > > Thanks for your email. It actually looks strange, could you > send the output of the function "counts", and subset for this > specific gene? Could you also do it for the "fData" function?? > > Alejandro > > > Hello, > > I'm using DEXSeq to obtain differentially expressed exons > between my 15 > samples and 8 controls. But for some reason most exons are > not testable and > are not assigned a p-value. Do you know what I'm doing wrong? > > First I've created a conservative list of exons per gene > in the human > genome through the canonical transcripts table in UCSC > (194511 in total). > Next I've obtained read counts for each exon in each > sample which are read > into one big data.frame. Then the following list of > commands are executed: > > metadata = data.frame( > row.names=colnames(dat)[7:29], > condition=c(rep("control", 8), rep("data", 15)), > replicate=c(c(1:8), c(1:15)), > libType=rep("paired-end", 23)) > > ## Add strand information that is not available in the > read counts > data.frame > dat = cbind(dat, strand=rep(NA, nrow(dat))) > > cds = newExonCountSet(countData = dat[,c(7:29)], design = > metadata, > geneIDs = dat[,5], exonIDs = dat[,6], > exonIntervals=dat[,c(2,3,4, 30)]) > > cds = estimateSizeFactors(cds) > cds = estimateDispersions(cds, minCount=2, maxExon=90) > cds = fitDispersionFunction(cds) > cds = testForDEU(cds) > cds = estimatelog2FoldChanges(cds) > > Now lets see how many exons are expressed significantly > different between > data and controls. Surprisingly only 1850 exons are > described here, not the > full 194511: > > res1 = DEUresultTable(cds) > table(res1$padjust < 0.05) > > FALSE TRUE > 1424 426 > > So I've zoomed in to an example gene. Below it can be seen > that indeed all > exons are marked as not testable. But looking at the > individual read counts > per sample (see further below) it does not make sense that > they are not > testable. Some exons indeed have a low read count and are > excluded > rightfully, but others have enough reads to base > dispersion on I would > think. > > If I have understood correctly testable can be false in > either of these > cases: > - Gene has only one exon, dispersion cannot be calculated. > - Readcounts for an exon are too low. > - A combination of the above. > > But all do not apply here. Any ideas? > > > fData for the gene: > geneID exonID testable dispBeforeSharing > dispFitted > dispersion pvalue padjust chr start end strand transcripts > 136687 ENSG00000009780 1 FALSE NA 0.2026585 > 0.2026585 NA NA 1 28052568 28052672 <na> <na> > 108973 ENSG00000009780 2 FALSE NA 0.9549670 > 0.9549670 NA NA 1 28053983 28054047 <na> <na> > 1100 ENSG00000009780 3 FALSE NA 1.9560912 > 1.9560912 NA NA 1 28056743 28056844 <na> <na> > 1 ENSG00000009780 4 FALSE NA 0.4722995 > 0.4722995 NA NA 1 28059114 28059168 <na> <na> > 22096 ENSG00000009780 5 FALSE NA 0.1146813 > 0.1146813 NA NA 1 28060542 28060694 <na> <na> > 13670 ENSG00000009780 6 FALSE NA 0.1127563 > 0.1127563 NA NA 1 28071165 28071322 <na> <na> > 13666 ENSG00000009780 7 FALSE NA 0.1450013 > 0.1450013 NA NA 1 28075579 28075665 <na> <na> > 22095 ENSG00000009780 8 FALSE NA 0.1160550 > 0.1160550 NA NA 1 28081706 28081841 <na> <na> > 13665 ENSG00000009780 9 FALSE NA 0.1129432 > 0.1129432 NA NA 1 28086037 28086138 <na> <na> > 138629 ENSG00000009780 10 FALSE NA 0.1112855 > 0.1112855 NA NA 1 28087006 28087092 <na> <na> > > Read counts for the gene per sample: > exon_id chr start end gene_id > exon_nr > sample1 sample2 sample3 sample4 sample5 sample6 sample7 > sample8 > 1 ENSE00000327880 1 28059114 28059168 ENSG00000009780 4 > 11 8 4 7 3 1 8 3 > 1100 ENSE00000571221 1 28056743 28056844 ENSG00000009780 3 > 1 0 1 1 0 0 0 1 > 13665 ENSE00000761751 1 28086037 28086138 > ENSG00000009780 9 91 > 145 86 47 169 36 231 136 > 13666 ENSE00000761753 1 28075579 28075665 > ENSG00000009780 7 19 > 62 32 18 44 23 61 38 > 13670 ENSE00000761780 1 28071165 28071322 > ENSG00000009780 6 77 > 139 73 66 78 30 168 88 > 22095 ENSE00000866714 1 28081706 28081841 > ENSG00000009780 8 78 > 94 52 47 40 41 121 60 > 22096 ENSE00000866716 1 28060542 28060694 > ENSG00000009780 5 36 > 61 50 63 67 10 129 69 > 108973 ENSE00001625313 1 28053983 28054047 ENSG00000009780 2 > 3 5 3 0 1 0 4 2 > 136687 ENSE00001909353 1 28052568 28052672 > ENSG00000009780 1 2 > 12 5 0 19 0 34 10 > 138629 ENSE00001955917 1 28087006 28087092 > ENSG00000009780 10 > 42 96 50 24 98 23 117 79 > sample9 sample10 sample11 sample12 sample13 > sample14 sample15 > sample16 sample17 sample18 sample19 sample20 sample21 sample22 > 1 1 1 0 3 4 1 1 > 0 3 7 0 0 1 0 > 1100 2 0 0 0 0 1 1 > 0 0 3 1 0 0 0 > 13665 78 64 130 55 66 27 75 > 53 49 126 36 37 48 53 > 13666 7 20 51 6 43 8 15 > 19 11 26 7 15 11 5 > 13670 75 85 186 79 86 58 70 > 86 47 149 82 49 66 38 > 22095 64 64 149 51 91 37 67 > 54 58 148 54 41 64 43 > 22096 90 79 287 66 97 29 75 > 92 58 126 67 55 71 51 > 108973 1 0 0 1 1 0 1 > 1 0 4 0 0 0 0 > 136687 5 28 31 6 27 5 4 > 12 5 32 6 7 9 16 > 138629 119 111 233 100 129 59 141 > 103 70 219 92 60 74 67 > sample23 strand > 1 0 NA > 1100 0 NA > 13665 24 NA > 13666 7 NA > 13670 27 NA > 22095 25 NA > 22096 21 NA > 108973 1 NA > 136687 1 NA > 138629 54 NA > > Design of the experiment: > condition replicate libType > sample1 control 1 paired-end > sample2 control 2 paired-end > sample3 control 3 paired-end > sample4 control 4 paired-end > sample5 control 5 paired-end > sample6 control 6 paired-end > sample7 control 7 paired-end > sample8 control 8 paired-end > sample9 data 1 paired-end > sample10 data 2 paired-end > sample11 data 3 paired-end > sample12 data 4 paired-end > sample13 data 5 paired-end > sample14 data 6 paired-end > sample15 data 7 paired-end > sample16 data 8 paired-end > sample17 data 9 paired-end > sample18 data 10 paired-end > sample19 data 11 paired-end > sample20 data 12 paired-end > sample21 data 13 paired-end > sample22 data 14 paired-end > sample23 data 15 paired-end > > sessionInfo() > > R version 2.15.0 (2012-03-30) > Platform: i686-pc-linux-gnu (32-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 > LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 > [6] LC_MESSAGES=C LC_PAPER=C LC_NAME=C > LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] plyr_1.7.1 DEXSeq_1.2.1 BiocInstaller_1.4.3 > DESeq_1.8.3 locfit_1.5-7 Biobase_2.16.0 > [7] BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] annotate_1.34.0 AnnotationDbi_1.18.0 biomaRt_2.12.0 > DBI_0.2-5 genefilter_1.38.0 geneplotter_1.34.0 > [7] grid_2.15.0 hwriter_1.3 IRanges_1.14.2 > lattice_0.20-6 RColorBrewer_1.0-5 RCurl_1.91-1 > [13] RSQLite_0.11.1 splines_2.15.0 statmod_1.4.15 > stats4_2.15.0 stringr_0.6 survival_2.36-12 > [19] tools_2.15.0 XML_3.9-4 xtable_1.7-0 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]]
ADD REPLYlink written 5.2 years ago by Alejandro Reyes1.5k
Dear Stefan Dentro, Thank you very much for sending me your ExonCountSet object. It was a small bug related to the order of your genes, since they are not grouped by geneID in your ExonCountSet object. Could you please update to DEXSeq 1.3.20? It is fixed there! Alejandro > Dear Stefano Dentro, > > Thanks for the information. It does not seem obvious to me to determine > what is the problem. Would it be possible for you to send me your > ExonCountSet object to give it a closer look? Of course, it would be > treated with full confidentiality and deleted after detecting/solving > the error. > > Alejandro Reyes > > >> Perhaps a clue: >> >> If I add a second gene to the small dataset I get the following error >> message: >> >>> genes >> [1] "ENSG00000009780" "ENSG00000034533" >> ... >>> cds = estimateDispersions(cds, minCount=2, maxExon=90) >> Dispersion estimation. (Progress report: one dot per 100 genes) >> Error in which(sapply(muhats, inherits, "try-error")) : >> argument to 'which' is not logical >> >> >> If I select two other genes, no error message, but one of the two >> genes is still marked as not testable, whilst there is enough data: >>> genes >> [1] "ENSG00000069345" "ENSG00000099889" >> ... >>> fData(cds) >> geneID exonID testable dispBeforeSharing dispFitted dispersion pvalue padjust chr start end >> 3 ENSG00000069345 4 FALSE NA 0.03745547 0.03745547 NA NA 16 47001996 47002076 >> 4 ENSG00000099889 16 TRUE 0.106713856 0.12728475 0.12728475 9.533455e-01 9.533455e-01 22 19960260 19960350 >> 1381 ENSG00000099889 18 TRUE 0.039759265 0.11401613 0.11401613 8.020906e-05 3.809930e-04 22 19959409 19959494 >> 1382 ENSG00000099889 15 TRUE 0.015147460 0.07255594 0.07255594 4.753684e-01 6.021333e-01 22 19960448 19960559 >> 1383 ENSG00000099889 14 TRUE 0.020525216 0.05938316 0.05938316 5.829357e-03 1.107578e-02 22 19960642 19960840 >> 1384 ENSG00000099889 13 TRUE 0.008016871 0.06746289 0.06746289 3.700721e-01 5.408746e-01 22 19961166 19961316 >> 1385 ENSG00000099889 12 TRUE 0.030236839 0.07790661 0.07790661 1.964000e-01 3.109666e-01 22 19961635 19961762 >> 1386 ENSG00000099889 9 TRUE 0.025364371 0.05520343 0.05520343 4.359563e-01 5.916549e-01 22 19964938 19965109 >> 1387 ENSG00000099889 8 TRUE 0.064383214 0.07631387 0.07631387 2.412871e-09 2.292227e-08 22 19965481 19965598 >> 6388 ENSG00000069345 8 FALSE NA 0.02990952 0.02990952 NA NA 16 46992915 46993042 >> 6389 ENSG00000069345 7 FALSE NA 0.02943529 0.02943529 NA NA 16 46993187 46993331 >> 6390 ENSG00000069345 6 FALSE NA 0.02756900 0.02756900 NA NA 16 46998523 46998719 >> 6391 ENSG00000069345 5 FALSE NA 0.02919581 0.02919581 NA NA 16 47001425 47001558 >> 6393 ENSG00000069345 3 FALSE NA 0.02784982 0.02784982 NA NA 16 47005261 47005484 >> 6394 ENSG00000069345 2 FALSE NA 0.06403618 0.06403618 NA NA 16 47005808 47005867 >> 23101 ENSG00000099889 11 TRUE 0.176647935 0.16264956 0.17664794 0.000000e+00 0.000000e+00 22 19963209 19963280 >> 76702 ENSG00000099889 17 TRUE 0.042072149 0.55244196 0.55244196 8.569994e-07 5.427663e-06 22 19959881 19959934 >> 79101 ENSG00000099889 2 FALSE NA 40.14086829 40.14086829 NA NA 22 19997978 19998031 >> 79190 ENSG00000099889 19 TRUE 0.347541116 0.09043853 0.34754112 4.506187e-03 9.513061e-03 22 19958739 19958858 >> 79566 ENSG00000069345 9 FALSE NA 0.02639237 0.02639237 NA NA 16 46989299 46991132 >> 111562 ENSG00000099889 6 TRUE 0.036879156 0.03886331 0.03886331 2.370001e-04 6.432859e-04 22 19967266 19967765 >> 122467 ENSG00000099889 5 TRUE 0.046550209 0.04180913 0.04655021 5.771771e-01 6.853978e-01 22 19968734 19969260 >> 126533 ENSG00000099889 7 TRUE 0.023668263 0.05572503 0.05572503 1.439509e-04 5.470135e-04 22 19966420 19966603 >> 127478 ENSG00000099889 10 TRUE 0.482851293 1.21253452 1.21253452 3.875210e-02 6.693545e-02 22 19964229 19964246 >> 131502 ENSG00000099889 4 TRUE 0.064691260 0.07631174 0.07631174 2.174826e-04 6.432859e-04 22 19969456 19969614 >> 135581 ENSG00000099889 20 TRUE 0.059913261 0.03024754 0.05991326 8.767254e-01 9.254324e-01 22 19957419 19958266 >> 137473 ENSG00000099889 1 TRUE 0.623260704 0.39612292 0.62326070 8.554939e-01 9.254324e-01 22 20004112 20004331 >> 147158 ENSG00000099889 3 TRUE 0.546816624 0.16681316 0.54681662 3.335978e-03 7.922947e-03 22 19978108 19978335 >> 186433 ENSG00000069345 1 FALSE NA 0.03292932 0.03292932 NA NA 16 47007406 47007699 >> strand transcripts log2fold(data/control) >> 3 <na> <na> NA >> 4 <na> <na> -0.03257363 >> 1381 <na> <na> -0.99913247 >> 1382 <na> <na> -0.16326931 >> 1383 <na> <na> 0.46834227 >> 1384 <na> <na> -0.19311802 >> 1385 <na> <na> 0.24176762 >> 1386 <na> <na> -0.14237513 >> 1387 <na> <na> -1.20469634 >> 6388 <na> <na> NA >> 6389 <na> <na> NA >> 6390 <na> <na> NA >> 6391 <na> <na> NA >> 6393 <na> <na> NA >> 6394 <na> <na> NA >> 23101 <na> <na> -2.68320588 >> 76702 <na> <na> -2.91182540 >> 79101 <na> <na> 21.32540529 >> 79190 <na> <na> 1.08400450 >> 79566 <na> <na> NA >> 111562 <na> <na> 0.46147010 >> 122467 <na> <na> 0.05583688 >> 126533 <na> <na> 0.64272965 >> 127478 <na> <na> -1.83381112 >> 131502 <na> <na> 0.73838679 >> 135581 <na> <na> -0.04311782 >> 137473 <na> <na> 0.17841783 >> 147158 <na> <na> 1.62180140 >> 186433 <na> <na> NA >> >>> counts(cds) >> sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 sample11 sample12 sample13 sample14 >> 3 247 161 259 144 186 191 226 147 107 200 302 69 133 62 >> 4 14 10 9 15 7 7 38 6 23 20 57 23 24 6 >> 1381 24 42 23 8 24 15 41 14 18 22 102 15 40 5 >> 1382 36 44 31 17 30 13 65 13 34 65 148 40 57 13 >> 1383 29 40 33 5 25 37 73 13 53 115 296 46 93 19 >> 1384 24 41 31 9 33 38 99 20 40 88 165 40 95 10 >> 1385 9 31 29 21 20 9 61 7 42 63 132 48 77 10 >> 1386 28 45 60 29 37 36 120 38 73 82 231 69 103 24 >> 1387 15 81 42 6 43 43 122 30 32 51 91 34 38 6 >> 6388 434 528 611 331 414 414 732 415 427 455 484 143 445 197 >> 6389 505 828 1061 402 692 830 1096 633 396 472 583 174 415 200 >> 6390 1008 1213 1696 991 959 1093 1533 884 1254 1325 3025 637 1164 672 >> 6391 439 780 1151 437 692 612 1151 700 459 654 1134 241 435 222 >> 6393 737 918 1301 683 786 940 1377 773 1069 1541 2603 613 1211 512 >> 6394 49 64 145 32 106 99 159 64 47 35 41 15 46 6 >> 23101 25 26 27 5 18 11 66 30 11 6 21 7 4 0 >> 76702 6 10 5 0 6 1 19 9 2 0 3 2 4 0 >> 79101 0 0 0 0 0 0 0 0 0 0 0 0 1 0 >> 79190 23 1 8 0 9 1 62 5 44 46 160 31 41 15 >> 79566 7133 6826 7143 6365 6600 6431 7394 4947 6323 7007 10031 1949 7597 4639 >> 111562 50 96 65 51 107 50 184 47 136 365 675 174 235 69 >> 122467 82 120 60 53 62 45 190 51 104 309 530 123 253 46 >> 126533 24 49 28 19 19 21 73 22 88 131 221 96 101 25 >> 127478 0 2 0 0 4 0 11 4 0 0 6 1 2 0 >> 131502 24 24 22 12 6 3 60 3 33 84 208 48 85 15 >> 135581 389 477 257 93 305 279 773 141 459 611 1830 361 766 173 >> 137473 7 0 1 8 3 0 1 0 1 16 17 5 10 6 >> 147158 5 3 4 0 3 6 10 0 0 52 27 8 23 21 >> 186433 65 132 241 134 128 147 227 136 231 955 989 152 606 142 >> sample15 sample16 sample17 sample18 sample19 sample20 sample21 sample22 sample23 >> 3 78 168 129 217 149 75 104 73 28 >> 4 41 2 35 125 7 51 16 15 13 >> 1381 14 4 27 82 4 61 20 34 7 >> 1382 56 8 64 270 12 102 37 86 28 >> 1383 68 25 103 393 20 159 55 105 51 >> 1384 51 12 83 260 10 119 42 77 35 >> 1385 40 19 82 222 7 115 41 54 17 >> 1386 96 21 104 425 17 195 72 93 28 >> 1387 28 11 40 163 9 88 30 63 16 >> 6388 332 660 374 867 648 272 327 73 285 >> 6389 253 556 353 785 471 268 292 152 205 >> 6390 950 1687 1139 2477 1607 838 924 554 508 >> 6391 337 646 437 1022 525 314 426 303 155 >> 6393 815 1476 912 2385 1225 764 921 774 288 >> 6394 15 21 28 16 10 21 11 14 8 >> 23101 4 2 16 38 0 23 11 2 2 >> 76702 1 0 1 6 0 3 2 2 3 >> 79101 0 0 0 0 0 1 0 0 0 >> 79190 26 6 85 79 8 133 37 71 29 >> 79566 6589 10656 8383 11925 10680 3772 4360 2309 5646 >> 111562 146 63 275 1046 42 423 186 330 56 >> 122467 115 37 157 826 31 376 162 217 31 >> 126533 85 19 97 421 26 215 99 98 31 >> 127478 1 0 0 5 0 4 1 3 2 >> 131502 35 21 56 266 13 136 56 60 16 >> 135581 655 208 890 2407 176 953 358 577 413 >> 137473 2 1 10 16 2 18 3 5 3 >> 147158 14 1 16 97 8 98 17 18 6 >> 186433 179 389 253 608 276 231 236 400 62 >> >> On Thu, Sep 20, 2012 at 9:31 AM, Stefan Dentro <sdentro at="" gmail.com="">> <mailto:sdentro at="" gmail.com="">> wrote: >> >> Hi Alejandro, >> >> I just reran the code with only ENSG00000009780 in the dataset. >> The odd thing is that this time it worked perfectly. >> >> Regards, >> Stefan >> >> >> ## Only ENSG00000009780 >> >> >counts(cds) >> sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 sample11 sample12 sample13 sample14 >> 1 11 8 4 7 3 1 8 3 1 1 0 3 4 1 >> 1100 1 0 1 1 0 0 0 1 2 0 0 0 0 1 >> 13665 91 145 86 47 169 36 231 136 78 64 130 55 66 27 >> 13666 19 62 32 18 44 23 61 38 7 20 51 6 43 8 >> 13670 77 139 73 66 78 30 168 88 75 85 186 79 86 58 >> 22095 78 94 52 47 40 41 121 60 64 64 149 51 91 37 >> 22096 36 61 50 63 67 10 129 69 90 79 287 66 97 29 >> 108973 3 5 3 0 1 0 4 2 1 0 0 1 1 0 >> 136687 2 12 5 0 19 0 34 10 5 28 31 6 27 5 >> 138629 42 96 50 24 98 23 117 79 119 111 233 100 129 59 >> sample15 sample16 sample17 sample18 sample19 sample20 sample21 sample22 sample23 >> 1 1 0 3 7 0 0 1 0 0 >> 1100 1 0 0 3 1 0 0 0 0 >> 13665 75 53 49 126 36 37 48 53 24 >> 13666 15 19 11 26 7 15 11 5 7 >> 13670 70 86 47 149 82 49 66 38 27 >> 22095 67 54 58 148 54 41 64 43 25 >> 22096 75 92 58 126 67 55 71 51 21 >> 108973 1 1 0 4 0 0 0 0 1 >> 136687 4 12 5 32 6 7 9 16 1 >> 138629 141 103 70 219 92 60 74 67 54 >> >> >> >fData(cds) >> geneID exonID testable dispBeforeSharing dispFitted dispersion pvalue padjust chr start end >> 1 ENSG00000009780 4 TRUE 0.40135710 0.40580351 0.40580351 4.256220e-04 8.512441e-04 1 28059114 28059168 >> 1100 ENSG00000009780 3 TRUE 0.57127358 1.98755125 1.98755125 9.715722e-01 9.715722e-01 1 28056743 28056844 >> 13665 ENSG00000009780 9 TRUE 0.01991574 0.03806432 0.03806432 2.537419e-09 1.268709e-08 1 28086037 28086138 >> 13666 ENSG00000009780 7 TRUE 0.09257216 0.07072531 0.09257216 6.740254e-07 2.246751e-06 1 28075579 28075665 >> 13670 ENSG00000009780 6 TRUE 0.01517262 0.03752086 0.03752086 7.138709e-01 7.931899e-01 1 28071165 28071322 >> 22095 ENSG00000009780 8 TRUE 0.03299040 0.04028325 0.04028325 5.589838e-01 6.987297e-01 1 28081706 28081841 >> 22096 ENSG00000009780 5 TRUE 0.04659993 0.03930375 0.04659993 1.632722e-04 4.081804e-04 1 28060542 28060694 >> 108973 ENSG00000009780 2 TRUE 0.01382188 0.99934256 0.99934256 6.159293e-02 1.026549e-01 1 28053983 28054047 >> 136687 ENSG00000009780 1 TRUE 0.38138204 0.12609379 0.38138204 1.302911e-01 1.861302e-01 1 28052568 28052672 >> 138629 ENSG00000009780 10 TRUE 0.01782266 0.03578012 0.03578012 7.407408e-12 7.407408e-11 1 28087006 28087092 >> strand transcripts log2fold(data/control) >> 1 <na> <na> -1.89528866 >> 1100 <na> <na> -0.01821944 >> 13665 <na> <na> -0.79494559 >> 13666 <na> <na> -1.13738728 >> 13670 <na> <na> -0.06639675 >> 22095 <na> <na> 0.07349758 >> 22096 <na> <na> 0.52608193 >> 108973 <na> <na> -1.59373125 >> 136687 <na> <na> 0.56880558 >> 138629 <na> <na> 0.85158758 >> >> >> >> On Wed, Sep 19, 2012 at 5:20 PM, Alejandro Reyes >> <alejandro.reyes at="" embl.de="" <mailto:alejandro.reyes="" at="" embl.de="">> wrote: >> >> Dear Stefan Dentro, >> >> Thanks for your email. It actually looks strange, could you >> send the output of the function "counts", and subset for this >> specific gene? Could you also do it for the "fData" function?? >> >> Alejandro >> >> >> Hello, >> >> I'm using DEXSeq to obtain differentially expressed exons >> between my 15 >> samples and 8 controls. But for some reason most exons are >> not testable and >> are not assigned a p-value. Do you know what I'm doing wrong? >> >> First I've created a conservative list of exons per gene >> in the human >> genome through the canonical transcripts table in UCSC >> (194511 in total). >> Next I've obtained read counts for each exon in each >> sample which are read >> into one big data.frame. Then the following list of >> commands are executed: >> >> metadata = data.frame( >> row.names=colnames(dat)[7:29], >> condition=c(rep("control", 8), rep("data", 15)), >> replicate=c(c(1:8), c(1:15)), >> libType=rep("paired-end", 23)) >> >> ## Add strand information that is not available in the >> read counts >> data.frame >> dat = cbind(dat, strand=rep(NA, nrow(dat))) >> >> cds = newExonCountSet(countData = dat[,c(7:29)], design = >> metadata, >> geneIDs = dat[,5], exonIDs = dat[,6], >> exonIntervals=dat[,c(2,3,4, 30)]) >> >> cds = estimateSizeFactors(cds) >> cds = estimateDispersions(cds, minCount=2, maxExon=90) >> cds = fitDispersionFunction(cds) >> cds = testForDEU(cds) >> cds = estimatelog2FoldChanges(cds) >> >> Now lets see how many exons are expressed significantly >> different between >> data and controls. Surprisingly only 1850 exons are >> described here, not the >> full 194511: >> >> res1 = DEUresultTable(cds) >> table(res1$padjust < 0.05) >> >> FALSE TRUE >> 1424 426 >> >> So I've zoomed in to an example gene. Below it can be seen >> that indeed all >> exons are marked as not testable. But looking at the >> individual read counts >> per sample (see further below) it does not make sense that >> they are not >> testable. Some exons indeed have a low read count and are >> excluded >> rightfully, but others have enough reads to base >> dispersion on I would >> think. >> >> If I have understood correctly testable can be false in >> either of these >> cases: >> - Gene has only one exon, dispersion cannot be calculated. >> - Readcounts for an exon are too low. >> - A combination of the above. >> >> But all do not apply here. Any ideas? >> >> >> fData for the gene: >> geneID exonID testable dispBeforeSharing >> dispFitted >> dispersion pvalue padjust chr start end strand transcripts >> 136687 ENSG00000009780 1 FALSE NA 0.2026585 >> 0.2026585 NA NA 1 28052568 28052672 <na> <na> >> 108973 ENSG00000009780 2 FALSE NA 0.9549670 >> 0.9549670 NA NA 1 28053983 28054047 <na> <na> >> 1100 ENSG00000009780 3 FALSE NA 1.9560912 >> 1.9560912 NA NA 1 28056743 28056844 <na> <na> >> 1 ENSG00000009780 4 FALSE NA 0.4722995 >> 0.4722995 NA NA 1 28059114 28059168 <na> <na> >> 22096 ENSG00000009780 5 FALSE NA 0.1146813 >> 0.1146813 NA NA 1 28060542 28060694 <na> <na> >> 13670 ENSG00000009780 6 FALSE NA 0.1127563 >> 0.1127563 NA NA 1 28071165 28071322 <na> <na> >> 13666 ENSG00000009780 7 FALSE NA 0.1450013 >> 0.1450013 NA NA 1 28075579 28075665 <na> <na> >> 22095 ENSG00000009780 8 FALSE NA 0.1160550 >> 0.1160550 NA NA 1 28081706 28081841 <na> <na> >> 13665 ENSG00000009780 9 FALSE NA 0.1129432 >> 0.1129432 NA NA 1 28086037 28086138 <na> <na> >> 138629 ENSG00000009780 10 FALSE NA 0.1112855 >> 0.1112855 NA NA 1 28087006 28087092 <na> <na> >> >> Read counts for the gene per sample: >> exon_id chr start end gene_id >> exon_nr >> sample1 sample2 sample3 sample4 sample5 sample6 sample7 >> sample8 >> 1 ENSE00000327880 1 28059114 28059168 ENSG00000009780 4 >> 11 8 4 7 3 1 8 3 >> 1100 ENSE00000571221 1 28056743 28056844 ENSG00000009780 3 >> 1 0 1 1 0 0 0 1 >> 13665 ENSE00000761751 1 28086037 28086138 >> ENSG00000009780 9 91 >> 145 86 47 169 36 231 136 >> 13666 ENSE00000761753 1 28075579 28075665 >> ENSG00000009780 7 19 >> 62 32 18 44 23 61 38 >> 13670 ENSE00000761780 1 28071165 28071322 >> ENSG00000009780 6 77 >> 139 73 66 78 30 168 88 >> 22095 ENSE00000866714 1 28081706 28081841 >> ENSG00000009780 8 78 >> 94 52 47 40 41 121 60 >> 22096 ENSE00000866716 1 28060542 28060694 >> ENSG00000009780 5 36 >> 61 50 63 67 10 129 69 >> 108973 ENSE00001625313 1 28053983 28054047 ENSG00000009780 2 >> 3 5 3 0 1 0 4 2 >> 136687 ENSE00001909353 1 28052568 28052672 >> ENSG00000009780 1 2 >> 12 5 0 19 0 34 10 >> 138629 ENSE00001955917 1 28087006 28087092 >> ENSG00000009780 10 >> 42 96 50 24 98 23 117 79 >> sample9 sample10 sample11 sample12 sample13 >> sample14 sample15 >> sample16 sample17 sample18 sample19 sample20 sample21 sample22 >> 1 1 1 0 3 4 1 1 >> 0 3 7 0 0 1 0 >> 1100 2 0 0 0 0 1 1 >> 0 0 3 1 0 0 0 >> 13665 78 64 130 55 66 27 75 >> 53 49 126 36 37 48 53 >> 13666 7 20 51 6 43 8 15 >> 19 11 26 7 15 11 5 >> 13670 75 85 186 79 86 58 70 >> 86 47 149 82 49 66 38 >> 22095 64 64 149 51 91 37 67 >> 54 58 148 54 41 64 43 >> 22096 90 79 287 66 97 29 75 >> 92 58 126 67 55 71 51 >> 108973 1 0 0 1 1 0 1 >> 1 0 4 0 0 0 0 >> 136687 5 28 31 6 27 5 4 >> 12 5 32 6 7 9 16 >> 138629 119 111 233 100 129 59 141 >> 103 70 219 92 60 74 67 >> sample23 strand >> 1 0 NA >> 1100 0 NA >> 13665 24 NA >> 13666 7 NA >> 13670 27 NA >> 22095 25 NA >> 22096 21 NA >> 108973 1 NA >> 136687 1 NA >> 138629 54 NA >> >> Design of the experiment: >> condition replicate libType >> sample1 control 1 paired-end >> sample2 control 2 paired-end >> sample3 control 3 paired-end >> sample4 control 4 paired-end >> sample5 control 5 paired-end >> sample6 control 6 paired-end >> sample7 control 7 paired-end >> sample8 control 8 paired-end >> sample9 data 1 paired-end >> sample10 data 2 paired-end >> sample11 data 3 paired-end >> sample12 data 4 paired-end >> sample13 data 5 paired-end >> sample14 data 6 paired-end >> sample15 data 7 paired-end >> sample16 data 8 paired-end >> sample17 data 9 paired-end >> sample18 data 10 paired-end >> sample19 data 11 paired-end >> sample20 data 12 paired-end >> sample21 data 13 paired-end >> sample22 data 14 paired-end >> sample23 data 15 paired-end >> >> sessionInfo() >> >> R version 2.15.0 (2012-03-30) >> Platform: i686-pc-linux-gnu (32-bit) >> >> locale: >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 >> LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 >> [6] LC_MESSAGES=C LC_PAPER=C LC_NAME=C >> LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] plyr_1.7.1 DEXSeq_1.2.1 BiocInstaller_1.4.3 >> DESeq_1.8.3 locfit_1.5-7 Biobase_2.16.0 >> [7] BiocGenerics_0.2.0 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.34.0 AnnotationDbi_1.18.0 biomaRt_2.12.0 >> DBI_0.2-5 genefilter_1.38.0 geneplotter_1.34.0 >> [7] grid_2.15.0 hwriter_1.3 IRanges_1.14.2 >> lattice_0.20-6 RColorBrewer_1.0-5 RCurl_1.91-1 >> [13] RSQLite_0.11.1 splines_2.15.0 statmod_1.4.15 >> stats4_2.15.0 stringr_0.6 survival_2.36-12 >> [19] tools_2.15.0 XML_3.9-4 xtable_1.7-0 >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 5.2 years ago by Alejandro Reyes1.5k
That is great! Thank you for the quick response! 1.3.20 is not in biocLite yet and I cannot find an SVN repository to fetch it from. How can I obtain it? Stefan On Thu, Sep 20, 2012 at 12:20 PM, Alejandro Reyes <alejandro.reyes@embl.de>wrote: > Dear Stefan Dentro, > > Thank you very much for sending me your ExonCountSet object. It was a > small bug related to the order of your genes, since they are not grouped by > geneID in your ExonCountSet object. Could you please update to DEXSeq > 1.3.20? It is fixed there! > > Alejandro > > > Dear Stefano Dentro, >> >> Thanks for the information. It does not seem obvious to me to determine >> what is the problem. Would it be possible for you to send me your >> ExonCountSet object to give it a closer look? Of course, it would be >> treated with full confidentiality and deleted after detecting/solving >> the error. >> >> Alejandro Reyes >> >> >> Perhaps a clue: >>> >>> If I add a second gene to the small dataset I get the following error >>> message: >>> >>> genes >>>> >>> [1] "ENSG00000009780" "ENSG00000034533" >>> ... >>> >>>> cds = estimateDispersions(cds, minCount=2, maxExon=90) >>>> >>> Dispersion estimation. (Progress report: one dot per 100 genes) >>> Error in which(sapply(muhats, inherits, "try-error")) : >>> argument to 'which' is not logical >>> >>> >>> If I select two other genes, no error message, but one of the two >>> genes is still marked as not testable, whilst there is enough data: >>> >>>> genes >>>> >>> [1] "ENSG00000069345" "ENSG00000099889" >>> ... >>> >>>> fData(cds) >>>> >>> geneID exonID testable dispBeforeSharing dispFitted >>> dispersion pvalue padjust chr start end >>> >>> 3 ENSG00000069345 4 FALSE NA 0.03745547 >>> 0.03745547 NA NA 16 47001996 47002076 >>> 4 ENSG00000099889 16 TRUE 0.106713856 0.12728475 >>> 0.12728475 9.533455e-01 9.533455e-01 22 19960260 19960350 >>> 1381 ENSG00000099889 18 TRUE 0.039759265 0.11401613 >>> 0.11401613 8.020906e-05 3.809930e-04 22 19959409 19959494 >>> 1382 ENSG00000099889 15 TRUE 0.015147460 0.07255594 >>> 0.07255594 4.753684e-01 6.021333e-01 22 19960448 19960559 >>> 1383 ENSG00000099889 14 TRUE 0.020525216 0.05938316 >>> 0.05938316 5.829357e-03 1.107578e-02 22 19960642 19960840 >>> 1384 ENSG00000099889 13 TRUE 0.008016871 0.06746289 >>> 0.06746289 3.700721e-01 5.408746e-01 22 19961166 19961316 >>> 1385 ENSG00000099889 12 TRUE 0.030236839 0.07790661 >>> 0.07790661 1.964000e-01 3.109666e-01 22 19961635 19961762 >>> 1386 ENSG00000099889 9 TRUE 0.025364371 0.05520343 >>> 0.05520343 4.359563e-01 5.916549e-01 22 19964938 19965109 >>> 1387 ENSG00000099889 8 TRUE 0.064383214 0.07631387 >>> 0.07631387 2.412871e-09 2.292227e-08 22 19965481 19965598 >>> 6388 ENSG00000069345 8 FALSE NA 0.02990952 >>> 0.02990952 NA NA 16 46992915 46993042 >>> 6389 ENSG00000069345 7 FALSE NA 0.02943529 >>> 0.02943529 NA NA 16 46993187 46993331 >>> 6390 ENSG00000069345 6 FALSE NA 0.02756900 >>> 0.02756900 NA NA 16 46998523 46998719 >>> 6391 ENSG00000069345 5 FALSE NA 0.02919581 >>> 0.02919581 NA NA 16 47001425 47001558 >>> 6393 ENSG00000069345 3 FALSE NA 0.02784982 >>> 0.02784982 NA NA 16 47005261 47005484 >>> 6394 ENSG00000069345 2 FALSE NA 0.06403618 >>> 0.06403618 NA NA 16 47005808 47005867 >>> 23101 ENSG00000099889 11 TRUE 0.176647935 0.16264956 >>> 0.17664794 0.000000e+00 0.000000e+00 22 19963209 19963280 >>> 76702 ENSG00000099889 17 TRUE 0.042072149 0.55244196 >>> 0.55244196 8.569994e-07 5.427663e-06 22 19959881 19959934 >>> 79101 ENSG00000099889 2 FALSE NA 40.14086829 >>> 40.14086829 NA NA 22 19997978 19998031 >>> 79190 ENSG00000099889 19 TRUE 0.347541116 0.09043853 >>> 0.34754112 4.506187e-03 9.513061e-03 22 19958739 19958858 >>> 79566 ENSG00000069345 9 FALSE NA 0.02639237 >>> 0.02639237 NA NA 16 46989299 46991132 >>> 111562 ENSG00000099889 6 TRUE 0.036879156 0.03886331 >>> 0.03886331 2.370001e-04 6.432859e-04 22 19967266 19967765 >>> 122467 ENSG00000099889 5 TRUE 0.046550209 0.04180913 >>> 0.04655021 5.771771e-01 6.853978e-01 22 19968734 19969260 >>> 126533 ENSG00000099889 7 TRUE 0.023668263 0.05572503 >>> 0.05572503 1.439509e-04 5.470135e-04 22 19966420 19966603 >>> 127478 ENSG00000099889 10 TRUE 0.482851293 1.21253452 >>> 1.21253452 3.875210e-02 6.693545e-02 22 19964229 19964246 >>> 131502 ENSG00000099889 4 TRUE 0.064691260 0.07631174 >>> 0.07631174 2.174826e-04 6.432859e-04 22 19969456 19969614 >>> 135581 ENSG00000099889 20 TRUE 0.059913261 0.03024754 >>> 0.05991326 8.767254e-01 9.254324e-01 22 19957419 19958266 >>> 137473 ENSG00000099889 1 TRUE 0.623260704 0.39612292 >>> 0.62326070 8.554939e-01 9.254324e-01 22 20004112 20004331 >>> 147158 ENSG00000099889 3 TRUE 0.546816624 0.16681316 >>> 0.54681662 3.335978e-03 7.922947e-03 22 19978108 19978335 >>> 186433 ENSG00000069345 1 FALSE NA 0.03292932 >>> 0.03292932 NA NA 16 47007406 47007699 >>> strand transcripts log2fold(data/control) >>> 3 <na> <na> NA >>> 4 <na> <na> -0.03257363 >>> 1381 <na> <na> -0.99913247 >>> 1382 <na> <na> -0.16326931 >>> 1383 <na> <na> 0.46834227 >>> 1384 <na> <na> -0.19311802 >>> 1385 <na> <na> 0.24176762 >>> 1386 <na> <na> -0.14237513 >>> 1387 <na> <na> -1.20469634 >>> 6388 <na> <na> NA >>> 6389 <na> <na> NA >>> 6390 <na> <na> NA >>> 6391 <na> <na> NA >>> 6393 <na> <na> NA >>> 6394 <na> <na> NA >>> 23101 <na> <na> -2.68320588 >>> 76702 <na> <na> -2.91182540 >>> 79101 <na> <na> 21.32540529 >>> 79190 <na> <na> 1.08400450 >>> 79566 <na> <na> NA >>> 111562 <na> <na> 0.46147010 >>> 122467 <na> <na> 0.05583688 >>> 126533 <na> <na> 0.64272965 >>> 127478 <na> <na> -1.83381112 >>> 131502 <na> <na> 0.73838679 >>> 135581 <na> <na> -0.04311782 >>> 137473 <na> <na> 0.17841783 >>> 147158 <na> <na> 1.62180140 >>> 186433 <na> <na> NA >>> >>> counts(cds) >>>> >>> sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 >>> sample9 sample10 sample11 sample12 sample13 sample14 >>> 3 247 161 259 144 186 191 226 147 >>> 107 200 302 69 133 62 >>> 4 14 10 9 15 7 7 38 6 >>> 23 20 57 23 24 6 >>> 1381 24 42 23 8 24 15 41 14 >>> 18 22 102 15 40 5 >>> 1382 36 44 31 17 30 13 65 13 >>> 34 65 148 40 57 13 >>> 1383 29 40 33 5 25 37 73 13 >>> 53 115 296 46 93 19 >>> 1384 24 41 31 9 33 38 99 20 >>> 40 88 165 40 95 10 >>> 1385 9 31 29 21 20 9 61 7 >>> 42 63 132 48 77 10 >>> 1386 28 45 60 29 37 36 120 38 >>> 73 82 231 69 103 24 >>> 1387 15 81 42 6 43 43 122 30 >>> 32 51 91 34 38 6 >>> 6388 434 528 611 331 414 414 732 415 >>> 427 455 484 143 445 197 >>> 6389 505 828 1061 402 692 830 1096 633 >>> 396 472 583 174 415 200 >>> 6390 1008 1213 1696 991 959 1093 1533 884 >>> 1254 1325 3025 637 1164 672 >>> 6391 439 780 1151 437 692 612 1151 700 >>> 459 654 1134 241 435 222 >>> 6393 737 918 1301 683 786 940 1377 773 >>> 1069 1541 2603 613 1211 512 >>> 6394 49 64 145 32 106 99 159 64 >>> 47 35 41 15 46 6 >>> 23101 25 26 27 5 18 11 66 30 >>> 11 6 21 7 4 0 >>> 76702 6 10 5 0 6 1 19 9 >>> 2 0 3 2 4 0 >>> 79101 0 0 0 0 0 0 0 0 >>> 0 0 0 0 1 0 >>> 79190 23 1 8 0 9 1 62 5 >>> 44 46 160 31 41 15 >>> 79566 7133 6826 7143 6365 6600 6431 7394 4947 >>> 6323 7007 10031 1949 7597 4639 >>> 111562 50 96 65 51 107 50 184 47 >>> 136 365 675 174 235 69 >>> 122467 82 120 60 53 62 45 190 51 >>> 104 309 530 123 253 46 >>> 126533 24 49 28 19 19 21 73 22 >>> 88 131 221 96 101 25 >>> 127478 0 2 0 0 4 0 11 4 >>> 0 0 6 1 2 0 >>> 131502 24 24 22 12 6 3 60 3 >>> 33 84 208 48 85 15 >>> 135581 389 477 257 93 305 279 773 141 >>> 459 611 1830 361 766 173 >>> 137473 7 0 1 8 3 0 1 0 >>> 1 16 17 5 10 6 >>> 147158 5 3 4 0 3 6 10 0 >>> 0 52 27 8 23 21 >>> 186433 65 132 241 134 128 147 227 136 >>> 231 955 989 152 606 142 >>> sample15 sample16 sample17 sample18 sample19 sample20 sample21 >>> sample22 sample23 >>> 3 78 168 129 217 149 75 104 >>> 73 28 >>> 4 41 2 35 125 7 51 16 >>> 15 13 >>> 1381 14 4 27 82 4 61 20 >>> 34 7 >>> 1382 56 8 64 270 12 102 37 >>> 86 28 >>> 1383 68 25 103 393 20 159 55 >>> 105 51 >>> 1384 51 12 83 260 10 119 42 >>> 77 35 >>> 1385 40 19 82 222 7 115 41 >>> 54 17 >>> 1386 96 21 104 425 17 195 72 >>> 93 28 >>> 1387 28 11 40 163 9 88 30 >>> 63 16 >>> 6388 332 660 374 867 648 272 327 >>> 73 285 >>> 6389 253 556 353 785 471 268 292 >>> 152 205 >>> 6390 950 1687 1139 2477 1607 838 924 >>> 554 508 >>> 6391 337 646 437 1022 525 314 426 >>> 303 155 >>> 6393 815 1476 912 2385 1225 764 921 >>> 774 288 >>> 6394 15 21 28 16 10 21 11 >>> 14 8 >>> 23101 4 2 16 38 0 23 11 >>> 2 2 >>> 76702 1 0 1 6 0 3 2 >>> 2 3 >>> 79101 0 0 0 0 0 1 0 >>> 0 0 >>> 79190 26 6 85 79 8 133 37 >>> 71 29 >>> 79566 6589 10656 8383 11925 10680 3772 4360 >>> 2309 5646 >>> 111562 146 63 275 1046 42 423 186 >>> 330 56 >>> 122467 115 37 157 826 31 376 162 >>> 217 31 >>> 126533 85 19 97 421 26 215 99 >>> 98 31 >>> 127478 1 0 0 5 0 4 1 >>> 3 2 >>> 131502 35 21 56 266 13 136 56 >>> 60 16 >>> 135581 655 208 890 2407 176 953 358 >>> 577 413 >>> 137473 2 1 10 16 2 18 3 >>> 5 3 >>> 147158 14 1 16 97 8 98 17 >>> 18 6 >>> 186433 179 389 253 608 276 231 236 >>> 400 62 >>> >>> On Thu, Sep 20, 2012 at 9:31 AM, Stefan Dentro <sdentro@gmail.com>>> <mailto:sdentro@gmail.com>> wrote: >>> >>> Hi Alejandro, >>> >>> I just reran the code with only ENSG00000009780 in the dataset. >>> The odd thing is that this time it worked perfectly. >>> >>> Regards, >>> Stefan >>> >>> >>> ## Only ENSG00000009780 >>> >>> >counts(cds) >>> sample1 sample2 sample3 sample4 sample5 sample6 sample7 >>> sample8 sample9 sample10 sample11 sample12 sample13 sample14 >>> 1 11 8 4 7 3 1 8 >>> 3 1 1 0 3 4 1 >>> 1100 1 0 1 1 0 0 0 >>> 1 2 0 0 0 0 1 >>> 13665 91 145 86 47 169 36 231 >>> 136 78 64 130 55 66 27 >>> 13666 19 62 32 18 44 23 61 >>> 38 7 20 51 6 43 8 >>> 13670 77 139 73 66 78 30 168 >>> 88 75 85 186 79 86 58 >>> 22095 78 94 52 47 40 41 121 >>> 60 64 64 149 51 91 37 >>> 22096 36 61 50 63 67 10 129 >>> 69 90 79 287 66 97 29 >>> 108973 3 5 3 0 1 0 4 >>> 2 1 0 0 1 1 0 >>> 136687 2 12 5 0 19 0 34 >>> 10 5 28 31 6 27 5 >>> 138629 42 96 50 24 98 23 117 >>> 79 119 111 233 100 129 59 >>> sample15 sample16 sample17 sample18 sample19 sample20 >>> sample21 sample22 sample23 >>> 1 1 0 3 7 0 0 >>> 1 0 0 >>> 1100 1 0 0 3 1 0 >>> 0 0 0 >>> 13665 75 53 49 126 36 37 >>> 48 53 24 >>> 13666 15 19 11 26 7 15 >>> 11 5 7 >>> 13670 70 86 47 149 82 49 >>> 66 38 27 >>> 22095 67 54 58 148 54 41 >>> 64 43 25 >>> 22096 75 92 58 126 67 55 >>> 71 51 21 >>> 108973 1 1 0 4 0 0 >>> 0 0 1 >>> 136687 4 12 5 32 6 7 >>> 9 16 1 >>> 138629 141 103 70 219 92 60 >>> 74 67 54 >>> >>> >>> >fData(cds) >>> geneID exonID testable dispBeforeSharing >>> dispFitted dispersion pvalue padjust chr start end >>> 1 ENSG00000009780 4 TRUE 0.40135710 0.40580351 >>> 0.40580351 4.256220e-04 8.512441e-04 1 28059114 28059168 >>> 1100 ENSG00000009780 3 TRUE 0.57127358 1.98755125 >>> 1.98755125 9.715722e-01 9.715722e-01 1 28056743 28056844 >>> 13665 ENSG00000009780 9 TRUE 0.01991574 0.03806432 >>> 0.03806432 2.537419e-09 1.268709e-08 1 28086037 28086138 >>> 13666 ENSG00000009780 7 TRUE 0.09257216 0.07072531 >>> 0.09257216 6.740254e-07 2.246751e-06 1 28075579 28075665 >>> 13670 ENSG00000009780 6 TRUE 0.01517262 0.03752086 >>> 0.03752086 7.138709e-01 7.931899e-01 1 28071165 28071322 >>> 22095 ENSG00000009780 8 TRUE 0.03299040 0.04028325 >>> 0.04028325 5.589838e-01 6.987297e-01 1 28081706 28081841 >>> 22096 ENSG00000009780 5 TRUE 0.04659993 0.03930375 >>> 0.04659993 1.632722e-04 4.081804e-04 1 28060542 28060694 >>> 108973 ENSG00000009780 2 TRUE 0.01382188 0.99934256 >>> 0.99934256 6.159293e-02 1.026549e-01 1 28053983 28054047 >>> 136687 ENSG00000009780 1 TRUE 0.38138204 0.12609379 >>> 0.38138204 1.302911e-01 1.861302e-01 1 28052568 28052672 >>> 138629 ENSG00000009780 10 TRUE 0.01782266 0.03578012 >>> 0.03578012 7.407408e-12 7.407408e-11 1 28087006 28087092 >>> strand transcripts log2fold(data/control) >>> 1 <na> <na> -1.89528866 >>> 1100 <na> <na> -0.01821944 >>> 13665 <na> <na> -0.79494559 >>> 13666 <na> <na> -1.13738728 >>> 13670 <na> <na> -0.06639675 >>> 22095 <na> <na> 0.07349758 >>> 22096 <na> <na> 0.52608193 >>> 108973 <na> <na> -1.59373125 >>> 136687 <na> <na> 0.56880558 >>> 138629 <na> <na> 0.85158758 >>> >>> >>> >>> On Wed, Sep 19, 2012 at 5:20 PM, Alejandro Reyes >>> <alejandro.reyes@embl.de <mailto:alejandro.reyes@embl.**de<alejandro.reyes@embl.de="">>> >>> wrote: >>> >>> Dear Stefan Dentro, >>> >>> Thanks for your email. It actually looks strange, could you >>> send the output of the function "counts", and subset for this >>> specific gene? Could you also do it for the "fData" function?? >>> >>> Alejandro >>> >>> >>> Hello, >>> >>> I'm using DEXSeq to obtain differentially expressed exons >>> between my 15 >>> samples and 8 controls. But for some reason most exons are >>> not testable and >>> are not assigned a p-value. Do you know what I'm doing >>> wrong? >>> >>> First I've created a conservative list of exons per gene >>> in the human >>> genome through the canonical transcripts table in UCSC >>> (194511 in total). >>> Next I've obtained read counts for each exon in each >>> sample which are read >>> into one big data.frame. Then the following list of >>> commands are executed: >>> >>> metadata = data.frame( >>> row.names=colnames(dat)[7:29], >>> condition=c(rep("control", 8), rep("data", 15)), >>> replicate=c(c(1:8), c(1:15)), >>> libType=rep("paired-end", 23)) >>> >>> ## Add strand information that is not available in the >>> read counts >>> data.frame >>> dat = cbind(dat, strand=rep(NA, nrow(dat))) >>> >>> cds = newExonCountSet(countData = dat[,c(7:29)], design = >>> metadata, >>> geneIDs = dat[,5], exonIDs = >>> dat[,6], >>> exonIntervals=dat[,c(2,3,4, 30)]) >>> >>> cds = estimateSizeFactors(cds) >>> cds = estimateDispersions(cds, minCount=2, maxExon=90) >>> cds = fitDispersionFunction(cds) >>> cds = testForDEU(cds) >>> cds = estimatelog2FoldChanges(cds) >>> >>> Now lets see how many exons are expressed significantly >>> different between >>> data and controls. Surprisingly only 1850 exons are >>> described here, not the >>> full 194511: >>> >>> res1 = DEUresultTable(cds) >>> table(res1$padjust < 0.05) >>> >>> FALSE TRUE >>> 1424 426 >>> >>> So I've zoomed in to an example gene. Below it can be seen >>> that indeed all >>> exons are marked as not testable. But looking at the >>> individual read counts >>> per sample (see further below) it does not make sense that >>> they are not >>> testable. Some exons indeed have a low read count and are >>> excluded >>> rightfully, but others have enough reads to base >>> dispersion on I would >>> think. >>> >>> If I have understood correctly testable can be false in >>> either of these >>> cases: >>> - Gene has only one exon, dispersion cannot be calculated. >>> - Readcounts for an exon are too low. >>> - A combination of the above. >>> >>> But all do not apply here. Any ideas? >>> >>> >>> fData for the gene: >>> geneID exonID testable dispBeforeSharing >>> dispFitted >>> dispersion pvalue padjust chr start end strand >>> transcripts >>> 136687 ENSG00000009780 1 FALSE NA >>> 0.2026585 >>> 0.2026585 NA NA 1 28052568 28052672 <na> <na> >>> 108973 ENSG00000009780 2 FALSE NA >>> 0.9549670 >>> 0.9549670 NA NA 1 28053983 28054047 <na> <na> >>> 1100 ENSG00000009780 3 FALSE NA >>> 1.9560912 >>> 1.9560912 NA NA 1 28056743 28056844 <na> <na> >>> 1 ENSG00000009780 4 FALSE NA >>> 0.4722995 >>> 0.4722995 NA NA 1 28059114 28059168 <na> <na> >>> 22096 ENSG00000009780 5 FALSE NA >>> 0.1146813 >>> 0.1146813 NA NA 1 28060542 28060694 <na> <na> >>> 13670 ENSG00000009780 6 FALSE NA >>> 0.1127563 >>> 0.1127563 NA NA 1 28071165 28071322 <na> <na> >>> 13666 ENSG00000009780 7 FALSE NA >>> 0.1450013 >>> 0.1450013 NA NA 1 28075579 28075665 <na> <na> >>> 22095 ENSG00000009780 8 FALSE NA >>> 0.1160550 >>> 0.1160550 NA NA 1 28081706 28081841 <na> <na> >>> 13665 ENSG00000009780 9 FALSE NA >>> 0.1129432 >>> 0.1129432 NA NA 1 28086037 28086138 <na> <na> >>> 138629 ENSG00000009780 10 FALSE NA >>> 0.1112855 >>> 0.1112855 NA NA 1 28087006 28087092 <na> <na> >>> >>> Read counts for the gene per sample: >>> exon_id chr start end gene_id >>> exon_nr >>> sample1 sample2 sample3 sample4 sample5 sample6 sample7 >>> sample8 >>> 1 ENSE00000327880 1 28059114 28059168 >>> ENSG00000009780 4 >>> 11 8 4 7 3 1 8 3 >>> 1100 ENSE00000571221 1 28056743 28056844 >>> ENSG00000009780 3 >>> 1 0 1 1 0 0 0 1 >>> 13665 ENSE00000761751 1 28086037 28086138 >>> ENSG00000009780 9 91 >>> 145 86 47 169 36 231 136 >>> 13666 ENSE00000761753 1 28075579 28075665 >>> ENSG00000009780 7 19 >>> 62 32 18 44 23 61 38 >>> 13670 ENSE00000761780 1 28071165 28071322 >>> ENSG00000009780 6 77 >>> 139 73 66 78 30 168 88 >>> 22095 ENSE00000866714 1 28081706 28081841 >>> ENSG00000009780 8 78 >>> 94 52 47 40 41 121 60 >>> 22096 ENSE00000866716 1 28060542 28060694 >>> ENSG00000009780 5 36 >>> 61 50 63 67 10 129 69 >>> 108973 ENSE00001625313 1 28053983 28054047 >>> ENSG00000009780 2 >>> 3 5 3 0 1 0 4 2 >>> 136687 ENSE00001909353 1 28052568 28052672 >>> ENSG00000009780 1 2 >>> 12 5 0 19 0 34 10 >>> 138629 ENSE00001955917 1 28087006 28087092 >>> ENSG00000009780 10 >>> 42 96 50 24 98 23 117 79 >>> sample9 sample10 sample11 sample12 sample13 >>> sample14 sample15 >>> sample16 sample17 sample18 sample19 sample20 sample21 >>> sample22 >>> 1 1 1 0 3 4 1 1 >>> 0 3 7 0 0 1 0 >>> 1100 2 0 0 0 0 1 1 >>> 0 0 3 1 0 0 0 >>> 13665 78 64 130 55 66 27 75 >>> 53 49 126 36 37 48 53 >>> 13666 7 20 51 6 43 8 15 >>> 19 11 26 7 15 11 5 >>> 13670 75 85 186 79 86 58 70 >>> 86 47 149 82 49 66 38 >>> 22095 64 64 149 51 91 37 67 >>> 54 58 148 54 41 64 43 >>> 22096 90 79 287 66 97 29 75 >>> 92 58 126 67 55 71 51 >>> 108973 1 0 0 1 1 0 1 >>> 1 0 4 0 0 0 0 >>> 136687 5 28 31 6 27 5 4 >>> 12 5 32 6 7 9 16 >>> 138629 119 111 233 100 129 59 >>> 141 >>> 103 70 219 92 60 74 67 >>> sample23 strand >>> 1 0 NA >>> 1100 0 NA >>> 13665 24 NA >>> 13666 7 NA >>> 13670 27 NA >>> 22095 25 NA >>> 22096 21 NA >>> 108973 1 NA >>> 136687 1 NA >>> 138629 54 NA >>> >>> Design of the experiment: >>> condition replicate libType >>> sample1 control 1 paired-end >>> sample2 control 2 paired-end >>> sample3 control 3 paired-end >>> sample4 control 4 paired-end >>> sample5 control 5 paired-end >>> sample6 control 6 paired-end >>> sample7 control 7 paired-end >>> sample8 control 8 paired-end >>> sample9 data 1 paired-end >>> sample10 data 2 paired-end >>> sample11 data 3 paired-end >>> sample12 data 4 paired-end >>> sample13 data 5 paired-end >>> sample14 data 6 paired-end >>> sample15 data 7 paired-end >>> sample16 data 8 paired-end >>> sample17 data 9 paired-end >>> sample18 data 10 paired-end >>> sample19 data 11 paired-end >>> sample20 data 12 paired-end >>> sample21 data 13 paired-end >>> sample22 data 14 paired-end >>> sample23 data 15 paired-end >>> >>> sessionInfo() >>> >>> R version 2.15.0 (2012-03-30) >>> Platform: i686-pc-linux-gnu (32-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 >>> LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 >>> [6] LC_MESSAGES=C LC_PAPER=C LC_NAME=C >>> LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods >>> base >>> >>> other attached packages: >>> [1] plyr_1.7.1 DEXSeq_1.2.1 BiocInstaller_1.4.3 >>> DESeq_1.8.3 locfit_1.5-7 Biobase_2.16.0 >>> [7] BiocGenerics_0.2.0 >>> >>> loaded via a namespace (and not attached): >>> [1] annotate_1.34.0 AnnotationDbi_1.18.0 biomaRt_2.12.0 >>> DBI_0.2-5 genefilter_1.38.0 geneplotter_1.34.0 >>> [7] grid_2.15.0 hwriter_1.3 IRanges_1.14.2 >>> lattice_0.20-6 RColorBrewer_1.0-5 RCurl_1.91-1 >>> [13] RSQLite_0.11.1 splines_2.15.0 statmod_1.4.15 >>> stats4_2.15.0 stringr_0.6 survival_2.36-12 >>> [19] tools_2.15.0 XML_3.9-4 xtable_1.7-0 >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org <mailto:bioconductor@r-**>>> project.org <bioconductor@r-project.org>> >>> >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< https://stat.ethz.ch/mailman/listinfo/bioconductor> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.* >>> *conductor<http: news.gmane.org="" gmane.science.biology.informatics="" .conductor=""> >>> >>> >>> >>> >>> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > [[alternative HTML version deleted]]
ADD REPLYlink written 5.2 years ago by Stefan Dentro70
Hi, On Thu, Sep 20, 2012 at 8:52 AM, Stefan Dentro <sdentro at="" gmail.com=""> wrote: > That is great! Thank you for the quick response! > > 1.3.20 is not in biocLite yet and I cannot find an SVN repository to fetch > it from. How can I obtain it? See this page for help to use bioconductor SVN: http://bioconductor.org/developers/source-control/ You need to use username/password: readonly/readonly So, I believe it would be: svn co --username=readonly https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/DEXSeq and enter the password when it asks you. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLYlink written 5.2 years ago by Steve Lianoglou12k
Ah that worked, cheers! On Thu, Sep 20, 2012 at 4:34 PM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi, > > On Thu, Sep 20, 2012 at 8:52 AM, Stefan Dentro <sdentro@gmail.com> wrote: > > That is great! Thank you for the quick response! > > > > 1.3.20 is not in biocLite yet and I cannot find an SVN repository to > fetch > > it from. How can I obtain it? > > See this page for help to use bioconductor SVN: > http://bioconductor.org/developers/source-control/ > > You need to use username/password: readonly/readonly > > So, I believe it would be: > > svn co --username=readonly > https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/DEXSeq > > and enter the password when it asks you. > > HTH, > -steve > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > [[alternative HTML version deleted]]
ADD REPLYlink written 5.2 years ago by Stefan Dentro70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 345 users visited in the last hour