Entering edit mode
Yao,Hui
▴
40
@yaohui-5209
Last seen 9.6 years ago
Dear DEXSeq authors and users,
I am using DEXSeq package (1.2.0) to do splicing analysis for human
genome. One part of results from the analysis is really confusing me.
It seems that for many genes the fitted expression, the fitted
splicing and estimated fold changes are inconsistent (actually
reversed) with its normalized counts.
To clearly explain the problem, I am showing a simple example with
only two genes with 30 samples as below.
> load("toBioConductor.RData")
> library(DEXSeq)
> ls()
[1] "testgene"
### For this data set, our interest is to investigate the
differential usage of exons between two tissue "type", X and N.
### The samples were collected from two batches, So we need to adjust
the batch effects in the following model.
> formu <- count ~ sample + (exon + from)*type
> testgene <- estimateDispersions(testgene, formula=formu)
> testgene <- fitDispersionFunction(testgene)
> f0 <- count~sample + from*exon + type
> f1 <- count~sample + from*exon + type*I(exon==exonID)
> testgene <- testForDEU(testgene,formula0=f0, formula1=f1)
> testgene <- estimatelog2FoldChanges(testgene,fitExpToVar="type" )
> res <- DEUresultTable(testgene)
n the attached figure, toBioConductor-plotDEXSeq.pdf, for E011, its
normalized counts of "N" are clearly smaller than those of "X".
However, for both Fitted splicing and Fitted expression, the level of
"N" is larger than that of "X". And if we checked the estimated fold
change as below, the fold change is consistent with fitted expression
and fitted splicing.
> res["ENSG00000001497:011",]
geneID exonID dispersion pvalue
padjust
ENSG00000001497:011 ENSG00000001497 E011 0.4096852 0.0006160211
0.01334712
meanBase log2fold(X/N)
ENSG00000001497:011 5.714405 -1.936449
We also explicitly check the normalized counts for E011 of this gene.
As below shows the median of each type. Those of "N" is clearly
smaller than "X".
> dtbl <- data.frame(counts=counts(testgene,normalized=T)["ENSG0000000
1497:011",],type=pData(testgene)$type)
> with(dtbl,tapply(counts,type,median))
X N
8.534784 2.064785
> sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US
[4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] DEXSeq_1.2.0 Biobase_2.14.0
loaded via a namespace (and not attached):
[1] biomaRt_2.10.0 hwriter_1.3 plyr_1.7.1 RCurl_1.91-1
statmod_1.4.14
[6] stringr_0.6 tools_2.14.0 XML_3.9-4
So, have I made any mistakes in the analysis?
Many thanks in advance,
Hui
Hui Yao, Ph.D.
Principal Statistical Analyst
MD Anderson Cancer Center
-------------- next part --------------
A non-text attachment was scrubbed...
Name: toBioConductor-plotDEXSeq.pdf
Type: application/pdf
Size: 18588 bytes
Desc: toBioConductor-plotDEXSeq.pdf
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20120625="" 45c8527d="" attachment.pdf="">