Dear great R helpers,
I would like to know if it is possible to apply 'svaseq' with 'DEXSeq' design in order to remove unknown batch effects similar to those greatly demonstrated with 'DESeq' (http://www.bioconductor.org/help/workflows/rnaseqGene/).
Due to my poor background in bioinformatics and the difference between DEXSeq and DESeq design, it would be very kind of you if an example could be provided.
With Respects,
Kaj Chokeshaiusaha
Thank you very much, I will definitely have a look at the section you referred.
Dear Alejandro,
I have tried svaseq analysis with pasilla data in DEXSeq and have got an error. The script was as following
library( "DEXSeq" )
library( "sva" )
inDir = system.file("extdata", package="pasilla")
countFiles = list.files(inDir, pattern="fb.txt$", full.names=TRUE)
basename(countFiles)
flattenedFile = list.files(inDir, pattern="gff$", full.names=TRUE)
basename(flattenedFile)
sampleTable = data.frame(row.names = c( "treated1", "treated2", "treated3", "untreated1", "untreated2", "untreated3", "untreated4" ),
condition = c("knockdown", "knockdown", "knockdown", "control", "control", "control", "control" ),
libType = c( "single-end", "paired-end", "paired-end", "single-end", "single-end", "paired-end", "paired-end" ) )
##----Generate DEXSeqDataSet----------##
dxd = DEXSeqDataSetFromHTSeq(countFiles,
sampleData=sampleTable,
design= ~ sample + exon + condition:exon,
flattenedfile=flattenedFile )
dxd <- estimateSizeFactors(dxd)
dxd <- dxd[rowMeans(counts(dxd))>1,]
##------------Try sva----------##
dat <- counts(dxd, normalized=TRUE)
mod1 <- model.matrix(~ sample + exon + condition:exon, colData(dxd))
mod0 <- model.matrix(~ sample + exon, colData(dxd))
svseq <- svaseq(dat, mod1, mod0, n.sv=NULL, B=5)
Error in solve.default(t(mod) %*% mod) :
system is computationally singular: reciprocal condition number = 4.47343e-18
Would you please suggest me what is wrong here?
Best Regards,
Kaj
Hi Kaj,
For svaseq, maybe it is unnecessary to specify the DEXSeq interaction model if you are looking for unknown covariates. I would try to estimate the surrogate variables for exon level counts (extracted using the featureCounts accessor) or gene level counts using "simpler" models:
mod1 <- ~ condition
mod0 <- ~ 1
And then plug in the resulting surrogate variables into the DEXSeqDataSet object.
Thank you very much!! I hope this one looks ok for you.
library( "DEXSeq" )
library( "DESeq2" )
library( "sva" )
inDir = system.file("extdata", package="pasilla")
countFiles = list.files(inDir, pattern="fb.txt$", full.names=TRUE)
basename(countFiles)
flattenedFile = list.files(inDir, pattern="gff$", full.names=TRUE)
basename(flattenedFile)
sampleTable = data.frame(row.names = c( "treated1", "treated2", "treated3", "untreated1", "untreated2", "untreated3", "untreated4" ),
condition = c("knockdown", "knockdown", "knockdown", "control", "control", "control", "control" ),
libType = c( "single-end", "paired-end", "paired-end", "single-end", "single-end", "paired-end", "paired-end" ) )
##----Generate DEXSeqDataSet---##
dxd = DEXSeqDataSetFromHTSeq(countFiles,
sampleData=sampleTable,
design= ~ sample + exon + condition:exon,
flattenedfile=flattenedFile )
##----Select only some genes----##
genesForSubset = read.table(file.path(inDir, "geneIDsinsubset.txt"),
stringsAsFactors=FALSE)[[1]]
dxd = dxd[geneIDs( dxd ) %in% genesForSubset,]
dxd <- estimateSizeFactors(dxd)
dxd <- dxd[rowMeans(counts(dxd))>1,]
##------------sva----------##
dat <- counts(dxd, normalized=TRUE)
mod1 <- model.matrix(~ sample + exon , colData(dxd))
mod0 <- model.matrix(~ 1, colData(dxd))
svseq <- svaseq(dat, mod1, mod0, n.sv=NULL, B=5)
#Number of significant surrogate variables is: 2
##-----Create DEXSeqDataSet with acquired surrogate vectors-----##
dxdsva = dxd
dxdsva$SV1=svseq$sv[,1]
dxdsva$SV2=svseq$sv[,2]
design(dxdsva) <- ~ SV1 + SV2 + sample + exon + condition:exon
##-------Test for differential exon usage--------##
dxdsva <- estimateDispersions( dxdsva )
dxdsva = testForDEU( dxdsva )
dxdsvaR = DEXSeqResults( dxdsva )
dxd <- estimateDispersions( dxd )
dxd = testForDEU(dxd)
dxdR = DEXSeqResults( dxd )
##-------Results--------##
table ( dxdsvaR$padj < 0.1 )
# FALSE TRUE
# 397 20
table ( dxdR$padj < 0.1 )
# FALSE TRUE
# 434 17
Best Regards,
Kaj
Hi Kaj,
In the design of the DEXSeqDataSet object, you need the formula to be like this:
Also, make sure that you pass the reduced model as a parameter to the function testForDEU:
That should do the job!
Thank you very very much!!