limma::removeBatchEffect into DESeq2
Entering edit mode
Patricia • 0
Last seen 2.6 years ago

Hi there,

I am writing because I am lost in the last step after use limma::removeBatchEffect and introduce the new matrix to DESeq2. The reason I used limma::removeBatchEffect is because the design is not full rank and I can't fix my batch in the design. The PCAs from before and after batch effect look correct.

Please, see my code below:

> library(DESeq2) 
> library(limma)
> dds <- DESeqDataSetFromMatrix(countData = as.matrix(counts),
>                               colData = colData,
>                               design = ~ trt)
> dds <- estimateSizeFactors(dds)
> sizeFactors(dds) 
> vsd <- vst(dds)
> plotPCA(vsd, "trt", "ID") # PCA before
> design0 <- model.matrix(~ 0  + colData$trt,
> data=data.frame(assay(vsd)))
> assay(vsd) <- limma::removeBatchEffect(assay(vsd),
> batch=as.vector(colData$batch), batch2=as.vector(ID), design =
> design0)
> plotPCA(vsd, "trt", "ID") #PCA after
> **How can I introduce vsd in DESeq() to obtain the DEGs ???**   

> dds <- DESeq(dds)
> resultsNames(dds) 
> res <- results(dds, =c("trt","Condition1","Condition2"))

Those are the PCAs


These are my failed attempts:


> dds2 <- assay(vsd) 
> dds2 <- DESeq(dds2)
> Error in DESeq(dds2) : is(object, "DESeqDataSet") is not TRUE


> assay(dds) <- assay(vsd)
> dds <- DESeq(dds)
> using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.
NaNs producedError in locfit(logDisps ~ logMeans, data = d[disps >= minDisp * 10, ,  : 
  fewer than one row in the data

... many thanks in advance!

> sessionInfo()  
> R version 3.6.1 (2019-07-05) Platform:
> x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Catalina
> 10.15.6
> Matrix products: default BLAS:  
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
> /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
> locale: [1]
> en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
> attached base packages: [1] parallel  stats4    stats     graphics 
> grDevices utils     datasets  methods   base     
> other attached packages:  [1] DESeq2_1.26.0              
> SummarizedExperiment_1.16.1 DelayedArray_0.12.2        
> BiocParallel_1.20.1          [5] matrixStats_0.56.0         
> Biobase_2.46.0              GenomicRanges_1.38.0       
> GenomeInfoDb_1.22.0          [9] IRanges_2.20.2             
> S4Vectors_0.24.3            BiocGenerics_0.32.0         edgeR_3.28.1  
> [13] limma_3.42.2               
> loaded via a namespace (and not attached):  [1] bit64_0.9-7           
> splines_3.6.1          Formula_1.2-3          assertthat_0.2.1       
> [5] latticeExtra_0.6-29    blob_1.2.1            
> GenomeInfoDbData_1.2.2 pillar_1.4.3            [9] RSQLite_2.2.0      
> backports_1.1.5        lattice_0.20-40        glue_1.4.1            
> [13] digest_0.6.25          RColorBrewer_1.1-2     XVector_0.26.0     
> checkmate_2.0.0        [17] colorspace_1.4-1       htmltools_0.5.0    
> Matrix_1.2-18          XML_3.99-0.3           [21] pkgconfig_2.0.3    
> genefilter_1.68.0      zlibbioc_1.32.0        purrr_0.3.3           
> [25] xtable_1.8-4           scales_1.1.0           jpeg_0.1-8.1       
> htmlTable_1.13.3       [29] tibble_3.0.3           annotate_1.64.0    
> ggplot2_3.3.0          ellipsis_0.3.0         [33] nnet_7.3-13        
> survival_3.1-11        magrittr_1.5           crayon_1.3.4          
> [37] memoise_1.1.0          foreign_0.8-76         tools_3.6.1        
> data.table_1.12.8      [41] lifecycle_0.2.0        stringr_1.4.0      
> munsell_0.5.0          locfit_1.5-9.1         [45] cluster_2.1.0      
> AnnotationDbi_1.48.0   compiler_3.6.1         rlang_0.4.7           
> [49] grid_3.6.1             RCurl_1.98-1.1         rstudioapi_0.11    
> htmlwidgets_1.5.1      [53] bitops_1.0-6           base64enc_0.1-3    
> gtable_0.3.0           DBI_1.1.0              [57] R6_2.4.1           
> gridExtra_2.3          knitr_1.29             dplyr_0.8.5           
> [61] bit_1.1-15.2           Hmisc_4.3-1            stringi_1.4.6      
> Rcpp_1.0.5             [65] geneplotter_1.64.0     vctrs_0.2.4        
> rpart_4.1-15           acepack_1.4.1          [69] png_0.1-7          
> tidyselect_1.0.0       xfun_0.16
deseq2 limma batch effect • 2.4k views
Entering edit mode

DESeq2 expects raw counts, vst is not appropriate since it is already normalized plus variance-stabilized. If you batch design is not full rank then there is little point even including it into either the design or an explicit batch correction. Both rely on the same basic principles and your batch is probably nested with something else. Can you show the colData/design to illustrate how batch is related to the actual experimental groups? In general it is not a good idea to try and force data into a framework when there is an explicit warning after using the standard approaches.

Entering edit mode

Hi ATpoint,

Thanks very much for your reply. I was aware that vts gave me a transformed matrix that is why I wasn't sure how proceed, so thanks for your advises.

About this project batches, there are eighteen and two samples from different sequencing batches. Also, I called "batch" the genetic background of the patients, but this is a covariante more than a batch. Please see below my ColData/Design.

Sample  Treatment           Label   ID
S103    Condition1.Untr Batch_1 Patient_1
S104    Condition1.DMSO Batch_1 Patient_1
S121    Condition2.Untr Batch_1 Patient_2
S122    Condition2.DMSO Batch_1 Patient_2
S127    Condition1.Untr Batch_1 Patient_2
S128    Condition1.DMSO Batch_1 Patient_2
S13         Condition2.Untr Batch_1 Patient_3
S133    Condition2.Untr Batch_1 Patient_4
S134    Condition2.DMSO Batch_1 Patient_4
S139    Condition1.Untr Batch_1 Patient_4
S14         Condition2.DMSO Batch_1 Patient_3
S140    Condition1.DMSO Batch_1 Patient_4
S19         Condition1.Untr Batch_1 Patient_3
S20         Condition1.DMSO Batch_1 Patient_3
S38         Condition2.DMSO Batch_1 Patient_5
S43         Condition1.Untr Batch_1 Patient_5
S44         Condition1.DMSO Batch_1 Patient_5
S67b    Condition1.Untr Batch_2 Patient_6
S68b    Condition1.DMSO Batch_2 Patient_6
S97         Condition2.Untr Batch_1 Patient_1
S98         Condition2.DMSO Batch_1 Patient_1
Entering edit mode
Last seen 42 minutes ago
Republic of Ireland

Their bi-plot is now visible

Patricia, this is one major issue: "DESeq2 expects raw counts, vst is not appropriate since it is already normalized plus variance-stabilized" - you cannot do this via DESeq2, as explained above.

You are also correcting for 2 'batches via batch=as.vector(colData$batch), batch2=as.vector(ID). Why is ID included?

The ideal way for dealing with batch is elaborated in one of my recent answers:

Entering edit mode

Hi Kevin,

Thanks for your prompt reply. I chose limma::removeBatchEffect because it allowed me add two batch effect corrections and I added the genetic background of the patient as one of them (ID). Please, see my ColData above in my response to ATpoint.

Do you have any suggestion to deal with this?

Thanks for the link, really useful.

Entering edit mode

Hi, you cannot proceed with your planned analyses, i.e., neither Option 1 nor Option 2, the reason being that DESeq2 expects raw counts. You neither have to remove any batch effect prior to running DESeq2. Please see my linked thread where I explain this in greater detail ( ).

I would neither be attempting to remove cross- / inter- patient effects via the ID column. If you are worried that this effect is biasing the data, then include ID in the design formula as a blocking factor / covariate. So, for all intents and purposes, you just need:

dds <- DESeqDataSetFromMatrix(countData = as.matrix(counts),
  colData = colData,
  design = ~ batch + ID + trt)

In this way, any test statistics that you derive for trt will be adjusted for batch and ID. Then, after you perform your differential expression analysis, you can use limma::removeBatchEffect() prior to performing PCA, clustering, or anything else downstream of the differential expression analysis. You do not have to then repeat the initial DESeq2 steps, or repeat the differential expression analysis.

The other worry here is that batch2 appears to only have 2 samples, which will mean that attempting to adjust for this batch may prove difficult. You may make a serious consideration to exclude these 2 samples from your study and just use batch1 samples.

So, assuming that you remove the batch2 samples and just want to include ID as a blocking factor, you may simply need:

dds <- DESeqDataSetFromMatrix(countData = as.matrix(counts),
  colData = colData,
  design = ~ ID + trt)

# diff. expression testing

# remove batch effect for downstream analyses
vsd <- vst(dds)
assay(vsd) <- limma::removeBatchEffect(assay(vsd),
  batch = colData(dds)[,'ID'])
Entering edit mode

Hi Kevin, thanks for your feedback!

I already tried the first design attempt with the message of full rank error.

dds <- DESeqDataSetFromMatrix(countData = as.matrix(counts),
colData = colData,
design = ~ batch + ID + trt) Error in checkFullRank(modelMatrix)

So, I am going to give a try to the option two, removing the two samples from batch_2.

Thanks again for all your comments, they are greatly appreciated.

Entering edit mode

Great, yes, simply taking out the batch2 samples may work.


Login before adding your answer.

Traffic: 470 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6