Parallel computing for IsoformSwitchAnalyzeR
Entering edit mode
maya.kappil ▴ 20
Last seen 2.4 years ago

Hello! I am interested to run the IsoformSwitchAnalyzeR package to assess differential transcript usage and related downstream analysis. However, I have a large set of RNAseq samples (200) - for the native DRIMSeq and DEXSeq packages, it is possible to add a command to indicate the number of cores to utilize to reduce the running time. For example:

  d <- dmPrecision(d, design=design_full,BPPARAM = BiocParallel::SerialParam())
  d <- dmFit(d, design=design_full,BPPARAM = BiocParallel::SerialParam())
  d <- dmTest(d, coef="condition2",BPPARAM = BiocParallel::SerialParam())

Do you have suggestions for how to accomplish this when running the isoformSwitchTestDEXSeq or isoformSwitchTestDRIMSeq commands?


IsoformSwitchAnalyzeR • 512 views
Entering edit mode
Last seen 21 hours ago
European Union

Hi Maya

Thanks for reaching out. Unfortunately there is no way of doing that from within IsoformSwitchAnalyzeR at the moment. That said the runtime of DRIMSeq mostly scales with the number of transcripts analyzed (and a little bit with the number of samples) so if you just make sure to use preFilter() first isoformSwitchTestDRIMSeq() should run in 30-60 min for ~30k transcripts.

Cheers Kristoffer

Entering edit mode

Hmm... I have somewhat more transcripts using the default preFilter setting (~80K). And, the calculated time to run DEXSeq seemed really large (17763754)... So far, I've tried running it for 15hrs (both correcting for a confounding variable and without correction) and they did not go to completion. I will try a more stringent filter to see if that helps. But just in case there is something obvious that I'm specifying incorrectly, below is the code I used:

Generate switchlist


salmonQuant <- importIsoformExpression( + parentDir = paste0(base.dir,"/quants"),addIsofomIdAsColumn=TRUE)

coldata<-read.csv("Covariates.csv") summary(coldata$BW) AGA LGA SGA 112 55 33 coldata$batch<-factor(coldata$batch)

myDesign<-data.frame(sampleID=coldata$ID, + condition=coldata$BW, + batch=coldata$batch)


aSwitchList_SGA <- importRdata( + isoformCountMatrix = salmonQuant$counts, + isoformRepExpression = salmonQuant$abundance, + designMatrix = myDesign, + comparisonsToMake=contrast, + isoformExonAnnoation = paste0(base.dir,"/gencode.v28.annotation.gtf.gz"), + showProgress = FALSE + )

aSwitchList_SGA This switchAnalyzeRlist list contains: 199338 isoforms from 57599 genes 1 comparison from 2 conditions Feature analyzed: [1] "ORFs"


aSwitchListSGA <- preFilter( + switchAnalyzeRlist = aSwitchListSGA, + geneExpressionCutoff = 1, + isoformExpressionCutoff = 0, + removeSingleIsoformGenes = TRUE + ) The fitering removed 121323 ( 60.86% of ) transcripts. There is now 78015 isoforms left


aSwitchListSGADEXSeq <- isoformSwitchTestDEXSeq( + switchAnalyzeRlist = aSwitchList_SGA, + correctForConfoundingFactors=TRUE, + reduceToSwitchingGenes=TRUE + ) Step 1 of 4: Correcting for batch effect... Step 2 of 4: Calculating Isoform Fraction matrix... Step 3 of 4: Testing each pariwise comparisons with DEXSeq (this might be a bit slow)... Estimated time (for dataset with ~30.000 isoforms): 17763754 min

Entering edit mode

You are right that DEXSeq is infeasible (it scales exponentially with number of samples) so it would need to be DRIMSeq. Your code looks good. You could try filtering a bit more strict - I would suggest setting IFcutoff=0.1 - meaning you are only considering isoforms which on average contribute with at least 10% to the parent gene expression.


Login before adding your answer.

Traffic: 269 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6