DEXseq batch removal
Entering edit mode
ta_awwad ▴ 10
Last seen 16 months ago
Frankfurt am Main

Hi everyone,

first, I am not expert in using DEXseq. I am trying to analyze DEU from two different datasets. I am just worried about the batch effect. I tried to use svaseq function as suggested by others to remove this effect but ended up with an error message:

> sampleAnno
file condition libType type
1 ES.1.txt ESC paired-end 1
2 ES.2.txt ESC paired-end 1
3 LV_rep1.txt LV paired-end 2
4 LV_rep2.txt LV paired-end 2

> dxd = DEXSeqDataSetFromHTSeq(
  countfiles = sampleAnno$file, 
  sampleData = sampleAnno,
  design= ~ sample + exon + condition:exon,
  flattenedfile = gffFile)
> dxd = estimateSizeFactors( dxd )
> dxd <- dxd[rowMeans(counts(dxd))>1,]
> dat <- counts(dxd, normalized=TRUE)
> library(sva)
> mod1 <- model.matrix(~ sample + exon , colData(dxd))
> mod0 <- model.matrix(~ 1, colData(dxd))
> svseq <- svaseq(dat, mod1, mod0,, B=5)
> dxdsva = dxd
> dxdsva$SV1=svseq$sv[,1]
> dxdsva$SV2=svseq$sv[,2]
> design(dxdsva) <- ~ SV1 + SV2 + sample + exon + condition:exon
> dxdsva <- estimateDispersions( dxdsva )

Error: BiocParallel errors
  1 remote errors, element index: 1
  0 unevaluated and other errors
  first remote error:
Error in estimateDispersionsGeneEst(x, maxit = maxit, quiet = quiet, modelMatrix = modelMatrix, : the number of samples and the number of model coefficients are equal,
  i.e., there are no replicates to estimate the dispersion.
  use an alternate design formula

sessionInfo( )
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.2.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] sva_3.44.0                  genefilter_1.78.0           mgcv_1.8-40                 nlme_3.1-159               
 [5] forcats_0.5.2               stringr_1.4.1               dplyr_1.0.10                purrr_0.3.5                
 [9] readr_2.1.3                 tidyr_1.2.1                 tibble_3.1.8                ggplot2_3.3.6              
[13] tidyverse_1.3.2             DEXSeq_1.42.0               RColorBrewer_1.1-3          AnnotationDbi_1.58.0       
[17] DESeq2_1.36.0               SummarizedExperiment_1.26.1 GenomicRanges_1.48.0        GenomeInfoDb_1.32.4        
[21] IRanges_2.30.1              S4Vectors_0.34.0            MatrixGenerics_1.8.1        matrixStats_0.62.0         
[25] Biobase_2.56.0              BiocGenerics_0.42.0         BiocParallel_1.30.3        

loaded via a namespace (and not attached):
 [1] fs_1.5.2               bitops_1.0-7           lubridate_1.8.0        bit64_4.0.5            filelock_1.0.2        
 [6] progress_1.2.2         httr_1.4.4             tools_4.2.0            backports_1.4.1        utf8_1.2.2            
[11] R6_2.5.1               DBI_1.1.3              colorspace_2.0-3       withr_2.5.0            tidyselect_1.1.2      
[16] prettyunits_1.1.1      bit_4.0.4              curl_4.3.3             compiler_4.2.0         rvest_1.0.3           
[21] cli_3.4.1              xml2_1.3.3             DelayedArray_0.22.0    scales_1.2.1           rappdirs_0.3.3        
[26] digest_0.6.29          Rsamtools_2.12.0       XVector_0.36.0         pkgconfig_2.0.3        limma_3.52.4          
[31] dbplyr_2.2.1           fastmap_1.1.0          readxl_1.4.1           rlang_1.0.6            rstudioapi_0.14       
[36] RSQLite_2.2.18         generics_0.1.3         jsonlite_1.8.2         hwriter_1.3.2.1        vroom_1.6.0           
[41] googlesheets4_1.0.1    RCurl_1.98-1.9         magrittr_2.0.3         GenomeInfoDbData_1.2.8 Matrix_1.5-1          
[46] Rcpp_1.0.9             munsell_0.5.0          fansi_1.0.3            lifecycle_1.0.3        edgeR_3.38.4          
[51] stringi_1.7.8          zlibbioc_1.42.0        BiocFileCache_2.4.0    grid_4.2.0             blob_1.2.3            
[56] parallel_4.2.0         crayon_1.5.2           lattice_0.20-45        haven_2.5.1            Biostrings_2.64.1     
[61] splines_4.2.0          annotate_1.74.0        hms_1.1.2              KEGGREST_1.36.3        locfit_1.5-9.6        
[66] pillar_1.8.1           geneplotter_1.74.0     codetools_0.2-18       biomaRt_2.52.0         reprex_2.0.2          
[71] XML_3.99-0.11          glue_1.6.2             modelr_0.1.9           tzdb_0.3.0             png_0.1-7             
[76] vctrs_0.4.2            cellranger_1.1.0       gtable_0.3.1           assertthat_0.2.1       cachem_1.0.6          
[81] xtable_1.8-4           broom_1.0.1            survival_3.4-0         googledrive_2.0.0      gargle_1.2.1          
[86] memoise_2.0.1          statmod_1.4.37         ellipsis_0.3.2

any idea what went wrong?

many many thanks TA

DEXSeq • 844 views
Entering edit mode
Alejandro Reyes ★ 1.9k
Last seen 6 months ago
Novartis Institutes for BioMedical Reseā€¦

Hello. If you want to incorporate surrogate variables into the DEXSeq model, the correct way to do it is by adding these variables as covariates in the design:

fullDesign <- ~sample+exon+ SV1:exon + SV2:exon + condition:exon
reducedDesign <- ~sample+exon+SV1:exon + SV2:exon

For your example, if you know which samples come from which datasets, you could add these as covariates instead of the surrogate variables from SVA. Two limitations of your example is that you only have 4 samples and accounting for batch/dataset will only work if your condition of interest is not confounded by dataset/batch.


Entering edit mode

Thanks much Alejandro, I tried the DEXseq model you mentioned but still the same error message. problem is I don't completely understand the linear model of DEXseq. but what i understand from your comment is to add another column in the sample annotation and include it in the design, am I correct? best TA

Entering edit mode

I even tried:

> sampleAnno
file condition libType type
1 ES.1.txt ESC paired-end 1
2 ES.2.txt ESC paired-end 2
3 LV_rep1.txt LV paired-end 1
4 LV_rep2.txt LV paired-end 2

ullDesign <- ~sample+exon+ type:exon  + condition:exon

reducedDesign <- ~sample+exon+type:exon 

dxdsva <- estimateDispersions( dxdsva,  formula = fullDesign)

dxdsva = testForDEU( dxdsva, 
                  reducedModel = fullDesign, 
                  fullModel = reducedDesign )

but same error:

Error: BiocParallel errors   1 remote errors, element index: 1   0
     unevaluated and other errors   first remote error: Error in
     nbinomLRT(x, reduced = reducedModelMatrix, full = fullModelMatrix):
     less than one degree of freedom, perhaps full and reduced models are
     not in the correct order
Entering edit mode

Hi ta_awwad. Please note that the error messages are already indicating what the problem is: "less than one degree of freedom, perhaps full and reduced models are not in the correct order".

The error message from your original post: "there are no replicates to estimate the dispersion".


Login before adding your answer.

Traffic: 807 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6