The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: Batch-effect: batch in design and corrected batch with sva- DESeq2
0
gravatar for bioinfo
24 days ago by
bioinfo0
bioinfo0 wrote:

Hi All,

I did DESeq2 on my set of samples and I got only two DE genes. When I checked I found that my samples were collected at different time points and in PCA a prominent batch effect is shown. So, I included a batch in my design

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                   directory = directory,
                                   design= ~ batch + condition)

where batch is batch [1] 1 2 3 1 2 3 Levels: 1 2 3

I get 15 DE genes after including design. So far all good.

But then I read another post where we can adjust the batch effect using 'SVA' and I tried it also. See below:

       dat  <- counts(dds, normalized = TRUE)
        idx  <- rowMeans(dat) > 1
        dat  <- dat[idx, ]
        mod  <- model.matrix(~ condition, colData(dds))
        mod0 <- model.matrix(~   1, colData(dds))

        n.sv <- num.sv(dat,mod, method = 'leek')
        n.sv

     svseq <- svaseq(dat, mod, mod0, n.sv = 3)

    ddssva <- dds
    ddssva$SV1 <- svseq$sv[,1]
    ddssva$SV2 <- svseq$sv[,2]
    ddssva$SV3 <- svseq$sv[,3]
    design(ddssva) <- ~ SV1 + SV2 + SV3 + condition

 ddssva <- DESeq(ddssva) 
 resSV <- results(ddssva)

n.sv was 4 but while using 4 it throws the below error so I used 3 and it worked fine

 svseq <- svaseq(dat, mod, mod0, n.sv = 4)
    Number of significant surrogate variables is:  4 
    Iteration (out of 5 ):Error in density.default(x, adjust = adj) : 'x' contains missing values
    In addition: Warning message:
    In pf(fstats, df1 = (df1 - df0), df2 = (n - df1)) : NaNs produced

Using this method (sva) I am getting 51 DE genes and only 5 genes are common between this and if using batch in design. My question is that why there is a difference between the results of the two methods and which method should I use form my downstream analysis.

 sessionInfo()
    R version 3.5.1 (2018-07-02)
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    Running under: Windows >= 8 x64 (build 9200)

    Matrix products: default

    locale:
    [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
    [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

    attached base packages:
    [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

    other attached packages:
     [1] sva_3.30.1                  genefilter_1.64.0           mgcv_1.8-24                 nlme_3.1-137               
     [5] openxlsx_4.1.0              XLConnect_0.2-15            XLConnectJars_0.2-15        xlsx_0.6.1                 
     [9] tibble_2.0.1                biomaRt_2.38.0              gplots_3.0.1                ggplot2_3.1.0              
    [13] DESeq2_1.22.2               SummarizedExperiment_1.12.0 DelayedArray_0.8.0          BiocParallel_1.16.5        
    [17] matrixStats_0.54.0          Biobase_2.42.0              GenomicRanges_1.34.0        GenomeInfoDb_1.18.1        
    [21] IRanges_2.16.0              S4Vectors_0.20.1            BiocGenerics_0.28.0        

    loaded via a namespace (and not attached):
     [1] bitops_1.0-6           bit64_0.9-7            RColorBrewer_1.1-2     progress_1.2.0         httr_1.4.0            
     [6] tools_3.5.1            backports_1.1.3        R6_2.3.0               rpart_4.1-13           KernSmooth_2.23-15    
    [11] Hmisc_4.1-1            DBI_1.0.0              lazyeval_0.2.1         colorspace_1.4-0       nnet_7.3-12           
    [16] withr_2.1.2            tidyselect_0.2.5       gridExtra_2.3          prettyunits_1.0.2      bit_1.1-14            
    [21] curl_3.3               compiler_3.5.1         htmlTable_1.13.1       labeling_0.3           caTools_1.17.1.1      
    [26] scales_1.0.0           checkmate_1.9.1        stringr_1.3.1          digest_0.6.18          foreign_0.8-70        
    [31] XVector_0.22.0         base64enc_0.1-3        pkgconfig_2.0.2        htmltools_0.3.6        limma_3.38.3          
    [36] htmlwidgets_1.3        rlang_0.3.1            rstudioapi_0.9.0       RSQLite_2.1.1          bindr_0.1.1           
    [41] gtools_3.8.1           zip_1.0.0              acepack_1.4.1          dplyr_0.7.8            RCurl_1.95-4.11       
    [46] magrittr_1.5           GenomeInfoDbData_1.2.0 Formula_1.2-3          Matrix_1.2-14          Rcpp_1.0.0            
    [51] munsell_0.5.0          stringi_1.2.4          yaml_2.2.0             zlibbioc_1.28.0        plyr_1.8.4            
    [56] grid_3.5.1             blob_1.1.1             gdata_2.18.0           crayon_1.3.4           lattice_0.20-35       
    [61] splines_3.5.1          xlsxjars_0.6.1         annotate_1.60.0        hms_0.4.2              locfit_1.5-9.1        
    [66] knitr_1.21             pillar_1.3.1           geneplotter_1.60.0     XML_3.98-1.16          glue_1.3.0            
    [71] latticeExtra_0.6-28    data.table_1.12.0      BiocManager_1.30.4     gtable_0.2.0           purrr_0.2.5           
    [76] assertthat_0.2.0       xfun_0.4               xtable_1.8-3           survival_2.42-3        rJava_0.9-10          
    [81] AnnotationDbi_1.44.0   memoise_1.1.0          bindrcpp_0.2.2         cluster_2.0.7-1

Any help would be appreciated! Thanks!

deseq2 svaseq • 128 views
ADD COMMENTlink modified 24 days ago • written 24 days ago by bioinfo0
Answer: Batch-effect: batch in design and corrected batch with sva- DESeq2
2
gravatar for Michael Love
24 days ago by
Michael Love21k
United States
Michael Love21k wrote:

With known batches, I tend to favor ~batch + condition. I'm not sure why you would get such a small overlap.

ADD COMMENTlink written 24 days ago by Michael Love21k

Thanks, Michael for the answer.

I was not sure why I am getting such different numbers. I would go ahead with ~ batch + condition.

Also when I use n.sv = 2 and use only two SVs I only get 4 DE genes.

Is it recommended to use high n.sv or low n.sv when running svaseq. Any suggestion?

ADD REPLYlink written 24 days ago by bioinfo0

I’ll let the SVA devels handle that one.

ADD REPLYlink written 24 days ago by Michael Love21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 200 users visited in the last hour