DEXseq batch removal
1
0
Entering edit mode
ta_awwad ▴ 10
@ta_awwad-11382
Last seen 6 months ago
Frankfurt am Main

Hi everyone,

first, I am not expert in using DEXseq. I am trying to analyze DEU from two different datasets. I am just worried about the batch effect. I tried to use svaseq function as suggested by others to remove this effect but ended up with an error message:


> sampleAnno
file condition libType type
1 ES.1.txt ESC paired-end 1
2 ES.2.txt ESC paired-end 1
3 LV_rep1.txt LV paired-end 2
4 LV_rep2.txt LV paired-end 2

> dxd = DEXSeqDataSetFromHTSeq(
  countfiles = sampleAnno$file, 
  sampleData = sampleAnno,
  design= ~ sample + exon + condition:exon,
  flattenedfile = gffFile)
> dxd = estimateSizeFactors( dxd )
> dxd <- dxd[rowMeans(counts(dxd))>1,]
> dat <- counts(dxd, normalized=TRUE)
> library(sva)
> mod1 <- model.matrix(~ sample + exon , colData(dxd))
> mod0 <- model.matrix(~ 1, colData(dxd))
> svseq <- svaseq(dat, mod1, mod0, n.sv=2, B=5)
> dxdsva = dxd
> dxdsva$SV1=svseq$sv[,1]
> dxdsva$SV2=svseq$sv[,2]
> design(dxdsva) <- ~ SV1 + SV2 + sample + exon + condition:exon
> dxdsva <- estimateDispersions( dxdsva )

Error: BiocParallel errors
  1 remote errors, element index: 1
  0 unevaluated and other errors
  first remote error:
Error in estimateDispersionsGeneEst(x, maxit = maxit, quiet = quiet, modelMatrix = modelMatrix, : the number of samples and the number of model coefficients are equal,
  i.e., there are no replicates to estimate the dispersion.
  use an alternate design formula




sessionInfo( )
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.2.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] sva_3.44.0                  genefilter_1.78.0           mgcv_1.8-40                 nlme_3.1-159               
 [5] forcats_0.5.2               stringr_1.4.1               dplyr_1.0.10                purrr_0.3.5                
 [9] readr_2.1.3                 tidyr_1.2.1                 tibble_3.1.8                ggplot2_3.3.6              
[13] tidyverse_1.3.2             DEXSeq_1.42.0               RColorBrewer_1.1-3          AnnotationDbi_1.58.0       
[17] DESeq2_1.36.0               SummarizedExperiment_1.26.1 GenomicRanges_1.48.0        GenomeInfoDb_1.32.4        
[21] IRanges_2.30.1              S4Vectors_0.34.0            MatrixGenerics_1.8.1        matrixStats_0.62.0         
[25] Biobase_2.56.0              BiocGenerics_0.42.0         BiocParallel_1.30.3        

loaded via a namespace (and not attached):
 [1] fs_1.5.2               bitops_1.0-7           lubridate_1.8.0        bit64_4.0.5            filelock_1.0.2        
 [6] progress_1.2.2         httr_1.4.4             tools_4.2.0            backports_1.4.1        utf8_1.2.2            
[11] R6_2.5.1               DBI_1.1.3              colorspace_2.0-3       withr_2.5.0            tidyselect_1.1.2      
[16] prettyunits_1.1.1      bit_4.0.4              curl_4.3.3             compiler_4.2.0         rvest_1.0.3           
[21] cli_3.4.1              xml2_1.3.3             DelayedArray_0.22.0    scales_1.2.1           rappdirs_0.3.3        
[26] digest_0.6.29          Rsamtools_2.12.0       XVector_0.36.0         pkgconfig_2.0.3        limma_3.52.4          
[31] dbplyr_2.2.1           fastmap_1.1.0          readxl_1.4.1           rlang_1.0.6            rstudioapi_0.14       
[36] RSQLite_2.2.18         generics_0.1.3         jsonlite_1.8.2         hwriter_1.3.2.1        vroom_1.6.0           
[41] googlesheets4_1.0.1    RCurl_1.98-1.9         magrittr_2.0.3         GenomeInfoDbData_1.2.8 Matrix_1.5-1          
[46] Rcpp_1.0.9             munsell_0.5.0          fansi_1.0.3            lifecycle_1.0.3        edgeR_3.38.4          
[51] stringi_1.7.8          zlibbioc_1.42.0        BiocFileCache_2.4.0    grid_4.2.0             blob_1.2.3            
[56] parallel_4.2.0         crayon_1.5.2           lattice_0.20-45        haven_2.5.1            Biostrings_2.64.1     
[61] splines_4.2.0          annotate_1.74.0        hms_1.1.2              KEGGREST_1.36.3        locfit_1.5-9.6        
[66] pillar_1.8.1           geneplotter_1.74.0     codetools_0.2-18       biomaRt_2.52.0         reprex_2.0.2          
[71] XML_3.99-0.11          glue_1.6.2             modelr_0.1.9           tzdb_0.3.0             png_0.1-7             
[76] vctrs_0.4.2            cellranger_1.1.0       gtable_0.3.1           assertthat_0.2.1       cachem_1.0.6          
[81] xtable_1.8-4           broom_1.0.1            survival_3.4-0         googledrive_2.0.0      gargle_1.2.1          
[86] memoise_2.0.1          statmod_1.4.37         ellipsis_0.3.2

any idea what went wrong?

many many thanks TA

DEXSeq • 1.2k views
ADD COMMENT
1
Entering edit mode
Alejandro Reyes ★ 1.9k
@alejandro-reyes-5124
Last seen 5 months ago
Novartis Institutes for BioMedical Reseā€¦

Hello. If you want to incorporate surrogate variables into the DEXSeq model, the correct way to do it is by adding these variables as covariates in the design:

fullDesign <- ~sample+exon+ SV1:exon + SV2:exon + condition:exon
reducedDesign <- ~sample+exon+SV1:exon + SV2:exon

For your example, if you know which samples come from which datasets, you could add these as covariates instead of the surrogate variables from SVA. Two limitations of your example is that you only have 4 samples and accounting for batch/dataset will only work if your condition of interest is not confounded by dataset/batch.

Alejandro

ADD COMMENT
0
Entering edit mode

Thanks much Alejandro, I tried the DEXseq model you mentioned but still the same error message. problem is I don't completely understand the linear model of DEXseq. but what i understand from your comment is to add another column in the sample annotation and include it in the design, am I correct? best TA

ADD REPLY
0
Entering edit mode

I even tried:

> sampleAnno
file condition libType type
1 ES.1.txt ESC paired-end 1
2 ES.2.txt ESC paired-end 2
3 LV_rep1.txt LV paired-end 1
4 LV_rep2.txt LV paired-end 2

ullDesign <- ~sample+exon+ type:exon  + condition:exon

reducedDesign <- ~sample+exon+type:exon 

dxdsva <- estimateDispersions( dxdsva,  formula = fullDesign)

dxdsva = testForDEU( dxdsva, 
                  reducedModel = fullDesign, 
                  fullModel = reducedDesign )

but same error:

Error: BiocParallel errors   1 remote errors, element index: 1   0
     unevaluated and other errors   first remote error: Error in
     nbinomLRT(x, reduced = reducedModelMatrix, full = fullModelMatrix):
     less than one degree of freedom, perhaps full and reduced models are
     not in the correct order
ADD REPLY
0
Entering edit mode

Hi ta_awwad. Please note that the error messages are already indicating what the problem is: "less than one degree of freedom, perhaps full and reduced models are not in the correct order".

The error message from your original post: "there are no replicates to estimate the dispersion".

ADD REPLY

Login before adding your answer.

Traffic: 1052 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6