Question: Batch-effect: batch in design and corrected batch with sva- DESeq2
gravatar for bioinfo
9 months ago by
bioinfo0 wrote:

Hi All,

I did DESeq2 on my set of samples and I got only two DE genes. When I checked I found that my samples were collected at different time points and in PCA a prominent batch effect is shown. So, I included a batch in my design

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                   directory = directory,
                                   design= ~ batch + condition)

where batch is batch [1] 1 2 3 1 2 3 Levels: 1 2 3

I get 15 DE genes after including design. So far all good.

But then I read another post where we can adjust the batch effect using 'SVA' and I tried it also. See below:

       dat  <- counts(dds, normalized = TRUE)
        idx  <- rowMeans(dat) > 1
        dat  <- dat[idx, ]
        mod  <- model.matrix(~ condition, colData(dds))
        mod0 <- model.matrix(~   1, colData(dds)) <-,mod, method = 'leek')

     svseq <- svaseq(dat, mod, mod0, = 3)

    ddssva <- dds
    ddssva$SV1 <- svseq$sv[,1]
    ddssva$SV2 <- svseq$sv[,2]
    ddssva$SV3 <- svseq$sv[,3]
    design(ddssva) <- ~ SV1 + SV2 + SV3 + condition

 ddssva <- DESeq(ddssva) 
 resSV <- results(ddssva) was 4 but while using 4 it throws the below error so I used 3 and it worked fine

 svseq <- svaseq(dat, mod, mod0, = 4)
    Number of significant surrogate variables is:  4 
    Iteration (out of 5 ):Error in density.default(x, adjust = adj) : 'x' contains missing values
    In addition: Warning message:
    In pf(fstats, df1 = (df1 - df0), df2 = (n - df1)) : NaNs produced

Using this method (sva) I am getting 51 DE genes and only 5 genes are common between this and if using batch in design. My question is that why there is a difference between the results of the two methods and which method should I use form my downstream analysis.

    R version 3.5.1 (2018-07-02)
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    Running under: Windows >= 8 x64 (build 9200)

    Matrix products: default

    [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
    [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

    attached base packages:
    [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

    other attached packages:
     [1] sva_3.30.1                  genefilter_1.64.0           mgcv_1.8-24                 nlme_3.1-137               
     [5] openxlsx_4.1.0              XLConnect_0.2-15            XLConnectJars_0.2-15        xlsx_0.6.1                 
     [9] tibble_2.0.1                biomaRt_2.38.0              gplots_3.0.1                ggplot2_3.1.0              
    [13] DESeq2_1.22.2               SummarizedExperiment_1.12.0 DelayedArray_0.8.0          BiocParallel_1.16.5        
    [17] matrixStats_0.54.0          Biobase_2.42.0              GenomicRanges_1.34.0        GenomeInfoDb_1.18.1        
    [21] IRanges_2.16.0              S4Vectors_0.20.1            BiocGenerics_0.28.0        

    loaded via a namespace (and not attached):
     [1] bitops_1.0-6           bit64_0.9-7            RColorBrewer_1.1-2     progress_1.2.0         httr_1.4.0            
     [6] tools_3.5.1            backports_1.1.3        R6_2.3.0               rpart_4.1-13           KernSmooth_2.23-15    
    [11] Hmisc_4.1-1            DBI_1.0.0              lazyeval_0.2.1         colorspace_1.4-0       nnet_7.3-12           
    [16] withr_2.1.2            tidyselect_0.2.5       gridExtra_2.3          prettyunits_1.0.2      bit_1.1-14            
    [21] curl_3.3               compiler_3.5.1         htmlTable_1.13.1       labeling_0.3           caTools_1.17.1.1      
    [26] scales_1.0.0           checkmate_1.9.1        stringr_1.3.1          digest_0.6.18          foreign_0.8-70        
    [31] XVector_0.22.0         base64enc_0.1-3        pkgconfig_2.0.2        htmltools_0.3.6        limma_3.38.3          
    [36] htmlwidgets_1.3        rlang_0.3.1            rstudioapi_0.9.0       RSQLite_2.1.1          bindr_0.1.1           
    [41] gtools_3.8.1           zip_1.0.0              acepack_1.4.1          dplyr_0.7.8            RCurl_1.95-4.11       
    [46] magrittr_1.5           GenomeInfoDbData_1.2.0 Formula_1.2-3          Matrix_1.2-14          Rcpp_1.0.0            
    [51] munsell_0.5.0          stringi_1.2.4          yaml_2.2.0             zlibbioc_1.28.0        plyr_1.8.4            
    [56] grid_3.5.1             blob_1.1.1             gdata_2.18.0           crayon_1.3.4           lattice_0.20-35       
    [61] splines_3.5.1          xlsxjars_0.6.1         annotate_1.60.0        hms_0.4.2              locfit_1.5-9.1        
    [66] knitr_1.21             pillar_1.3.1           geneplotter_1.60.0     XML_3.98-1.16          glue_1.3.0            
    [71] latticeExtra_0.6-28    data.table_1.12.0      BiocManager_1.30.4     gtable_0.2.0           purrr_0.2.5           
    [76] assertthat_0.2.0       xfun_0.4               xtable_1.8-3           survival_2.42-3        rJava_0.9-10          
    [81] AnnotationDbi_1.44.0   memoise_1.1.0          bindrcpp_0.2.2         cluster_2.0.7-1

Any help would be appreciated! Thanks!

deseq2 svaseq • 256 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by bioinfo0
Answer: Batch-effect: batch in design and corrected batch with sva- DESeq2
gravatar for Michael Love
9 months ago by
Michael Love25k
United States
Michael Love25k wrote:

With known batches, I tend to favor ~batch + condition. I'm not sure why you would get such a small overlap.

ADD COMMENTlink written 9 months ago by Michael Love25k

Thanks, Michael for the answer.

I was not sure why I am getting such different numbers. I would go ahead with ~ batch + condition.

Also when I use = 2 and use only two SVs I only get 4 DE genes.

Is it recommended to use high or low when running svaseq. Any suggestion?

ADD REPLYlink written 9 months ago by bioinfo0

I’ll let the SVA devels handle that one.

ADD REPLYlink written 9 months ago by Michael Love25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 417 users visited in the last hour