Dear Fong Chun Chan,
Thank you for your interest in DEXSeq and sorry in advance for the
e-mail. We have also noticed that the computing time increases
considerably when you have a large number of samples, conditions or
number of exons of a gene. For users in these situations, we have
implemented a variant of this functions (estimateDispersionsTRT and
testForDEUTRT) in the most recent versions of DEXSeq in the svn.
The difference relies on how the model matrix is prepared, in the
"normal" functions, the model matrices used to fit the glms are
for each exon, such that each exon bin is treated individually,
independently of which exon you are testing. For example, if you have
gene with 5 exons, when testing for exon E001, you would consider
independently E002, E003, ... , E005 in the model.
In the "TRT" implementation the same model matrix is used for all the
exons. In the same example as before, you would consider E001 and the
sum of all the rest exons of the same gene. This reduces the model and
allows to use DEXSeq with a large number of samples. For more clarity,
you could try to compare the normal model frame of a gene with the TRT
modelFrameForTRT( pasillaExons )
Using the same example, in the last model frame, "this" would be the
"E001" and "others" would be the sum of E002 + E003 + ... + E005.
This would be the "normal" DEXSeq analysis:
pasillaExons <- estimateSizeFactors( pasillaExons )
pasillaExons <- estimateDispersions( pasillaExons )
pasillaExons <- fitDispersionFunction( pasillaExons )
pasillaExons <- testForDEU( pasillaExons )
This would be the "TRT",
pasillaExonsTRT <- estimateSizeFactors( pasillaExons )
pasillaExonsTRT <- estimateDispersionsTRT( pasillaExons )
pasillaExonsTRT <- fitDispersionFunction( pasillaExons )
pasillaExonsTRT <- testForDEUTRT( pasillaExons )
And you can see that you get the same results:
I have the "TRT" tried this for large cohorts with complex models and
works nicely and in reasonable computing times.
ps. this changes need to be added to the vignette.
> Hi all,
> I've been trying to get DEXSeq to run on a fairly large RNA-seq
> I have. To be specific, I have 89 samples and I am attempt to
> exon usage results on > 500,000 exons.
> I've followed the latest tutorial (1.5.6) on Bioconductor and it so
> I've had relatively no problems. It just the two steps that are
> estimateDispersions and testForDEU, are taking a fairly long time.
> already attempted to parallelize this on a 48-core 256GB machine,
but I get
> very little progress on the run-time of these functions.
> I was just wondering if anyone has a good way of running DEXSeq on
> large cohort. Tips on how to reduce run time? Are there way to
> these jobs across a cluster rather than rely on a single machine
> multi-cores? Any help would be greatly appreciated.
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives: