Hi all,
My name is Nandan Deshpande. I am working at the Systems Biology Initiative, UNSW, Sydney.
I have started using DEXSeq recently. I have a few questions about using the tool ...I will appreciate if someone can assist me.
My experiment design (for cancer data-set) includes 9 patients, 2 condition states for each patient ('C1D1' & 'EOT') and 2 batches ('BACTH1' & 'BATCH2') in which the RNASeq data was generated.
Our main question is to check if any of the exons show Differential exon usage across conditions C1D1 and EOT for the nine patients.
I wanted to confirm with the experts in the blog if the steps I have used for my analysis are sufficient to produce a correct result.
My script (which I have tried to derive from the DEXSeq vignette) is as below --->
---------------------
library("DEXSeq")
library("BiocParallel")
#Using multiple cores
BPPARAM=MulticoreParam(workers=12)flattenedFile <- "XXX_DEXSeq.gff"
sampleTable= data.frame(row.names=c("AW_C1D1","NR_C1D1","DD_C1D1","EL_C1D1","DG_C1D1","EC_C1D1","GR_C1D1","LA_C1D1","RL_C1D1","AW_EOT","NR_EOT","EL_EOT","DD_EOT","EC_EOT","GR_EOT","LA_EOT","RL_EOT","DG_EOT"),condition=c("START","START","START","START","START","START","START","START","START","END","END","END","END","END","END","END","END","END"),libType=c("Batch1","Batch1","Batch2","Batch1","Batch2","Batch2","Batch2","Batch2","Batch2","Batch1","Batch1","Batch1","Batch2","Batch2","Batch2","Batch2","Batch2","Batch2"))
suppressPackageStartupMessages(library("DEXSeq") )
#defining models
formulaFullModel= ~sample+exon+libType:exon+condition:exon
formulaReducedModel= ~sample+exon+libType:exon
dxd=DEXSeqDataSetFromHTSeq(countFile,sampleData=sampleTable,design=~sample+exon+condition:exon,flattenedfile=flattenedFile )
dxd=estimateSizeFactors( dxd )
sizeFactors(dxd)
dxd=estimateDispersions( dxd, formula=formulaFullModel, BPPARAM=BPPARAM )
dxd=testForDEU( dxd,reducedModel= formulaReducedModel,fullModel = formulaFullModel, BPPARAM=BPPARAM)
dxd=estimateExonFoldChanges( dxd,fitExpToVar="condition", BPPARAM=BPPARAM)
dxr2=DEXSeqResults(dxd)
table( dxr2$padj< 0.1)
DEXSeqHTML(dxr2,FDR=0.1,color=c("#FF000080","#0000FF80") )
Other Queries:
1) Is there a straight forward way to draw the PCA plot in DEXSeq, using the exon count table? I can then test if I am able to group my patient conditions with and without using the 'libType' in the model (I assume using 'libType' should adjust for variations due to bacth effect!!)
2) When using EdgeR/DeSeq for differential expression analysis, we restrict the use of genes for which the cpm values for a certain number of replicates is say >3 cpm; Is there a step similar to this for DEXSeq?
I am getting many exons shown to be having differential exon usage with individual exon counts very small (between 0-10)!! :-(
regards,
Nandan
Hi Nandan,
You could also introduce the patient factor in your design, by adding these columns to the sampleTable and use the formulas:
formulaFullModel= ~sample+exon+batch:exon+patient:exon+condition:exon
formulaReducedModel= ~sample+exon+batch:exon+patient:exon
These formulas should take into account the patient to patient variability. If you have additional metadata such as the sex of the individuals I would also introduced them as above. DEXSeq uses the independent filtering strategy from DESeq2, that should already remove some of the lowly expressed exons. But you could always remove some of them beforehands by any preferred threshold, this should also speed up computations!
Alejandro