RNAseq : identical sample-sample distances after using limma::removeBatchEffect
1
0
Entering edit mode
athiebaut • 0
@112b3e6e
Last seen 8 days ago
Switzerland

Hello, I've been analysing RNAseq samples to identify differentially expressed genes (DEG) using DESeq2. I have 6 samples, 3 treated/3 control and a batch effect caused by the day the experiments were performed. Briefly, this is the design of the experiment :

Sample   Treatment    BatchEffect
  S1      Control          A
  S2      Control          B
  S3      Control          C
  S4      Treated          A
  S5      Treated          B
  S6      Treated          C

I've accounted for the batch effect in DESeq2 using design = ~ BatchEffect + Treatment and I found several DEG, so no problem here. On top of that, I also wanted to assess the quality of the data and the impact of the batch effect removal using sample to sample distances, amongst other things (as adviced in DESeq2 vignette). The batch effect removal (I used limma::removeBatchEffect) is pretty efficient because it clusters the samples by condition and reduce the distances between them. However, after removing the batch effect, I noticed that the sample to sample distances are the same between conditions. Here's an example of distance heatmaps before and after batch correction :

Distance

For example, the distance between Treated:A and Treated:B is the same than between Control:A and Control:B. This experiment was performed in 10 species and the "identical distances" happened everytime. Is it an intended feature of batch effect removal ? If not, what could explain this behaviour ?

Here's the code I used (after running DESeqDataSetFromMatrix() and DESeq() commands) :

# Applies variance stabilization
blind_vst <- varianceStabilizingTransformation(dds, blind = T, fitType = "local")
non_blind_vst <- varianceStabilizingTransformation(dds, blind = F, fitType = "local")

# Removes batch effect
mat <- assay(non_blind_vst)
model_mat <- model.matrix(formula(paste("~ ",Treatment)), colData(non_blind_vst))
mat <- limma::removeBatchEffect(x = mat, batch = non_blind_vst[[Treatment]], design = model_mat)
assay(non_blind_vst) <- mat

# Sample to sample distance heatmap without batch effect correction
## Extracts distance data for the plot
sample_dist <- dist(t(assay(blind_vst)))
sample_dist_mat <- as.matrix(sample_dist)
## Makes the plot
rownames(sample_dist_mat) <- paste(blind_vst[[condition]],blind_vst[[batch]], sep = ':')
colnames(sample_dist_mat) <- paste(blind_vst[[condition]],blind_vst[[batch]], sep = ':')
pheatmap(mat = sample_dist_mat, color = colorRampPalette(rev(brewer.pal(9,"Blues")))(255),
          clustering_distance_rows = sample_dist, clustering_distance_cols = sample_dist,
          border_color = "grey50", treeheight_row = 75, treeheight_col = 75,
          legend = T, show_rownames = T, show_colnames = T,
          main = bquote(atop("Distance between samples in"~italic(.(assembly)),
                             "Before batch correction")),
          fontsize = 13, fontsize_row = 10, fontsize_col = 10, angle_col = 315,
          display_numbers = T, number_format = "%.1f", number_color = "red", fontsize_number = 10,
          na_col = "grey", silent = F)

# Sample to sample distance heatmap with batch effect correction
## Extracts distance data for the plot
sample_dist_cor <- dist(t(assay(non_blind_vst)))
sample_dist_mat_cor <- as.matrix(sample_dist_cor)
## Makes the plot
rownames(sample_dist_mat_cor) <- paste(non_blind_vst[[condition]],non_blind_vst[[batch]], sep = ':')
colnames(sample_dist_mat_cor) <- paste(non_blind_vst[[condition]],non_blind_vst[[batch]], sep = ':')
pheatmap(mat = sample_dist_mat_cor, color = colorRampPalette(rev(brewer.pal(9,"Blues")))(255),
         clustering_distance_rows = sample_dist_cor, clustering_distance_cols = sample_dist_cor,
         border_color = "grey50", treeheight_row = 75, treeheight_col = 75,
         legend = T, show_rownames = T, show_colnames = T,
         main = bquote(atop("Distance between samples in"~italic(.(assembly)),
                            "After batch correction")),
         fontsize = 13, fontsize_row = 10, fontsize_col = 10, angle_col = 315,
         display_numbers = T, number_format = "%.1f", number_color = "red", fontsize_number = 10,
         na_col = "grey", silent = F)

And here's the result of sessionInfo() :

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=fr_FR.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=fr_FR.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=fr_FR.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] scales_1.1.1                RColorBrewer_1.1-2          pheatmap_1.0.12             limma_3.44.3               
 [5] ggrepel_0.9.1               ggplot2_3.3.3               IHW_1.16.0                  DESeq2_1.28.1              
 [9] SummarizedExperiment_1.18.2 DelayedArray_0.14.1         matrixStats_0.59.0          Biobase_2.48.0             
[13] GenomicRanges_1.40.0        GenomeInfoDb_1.24.2         IRanges_2.22.2              S4Vectors_0.26.1           
[17] BiocGenerics_0.34.0         data.table_1.14.0           apeglm_1.10.0              

loaded via a namespace (and not attached):
 [1] bit64_4.0.5            splines_4.0.3          blob_1.2.1             GenomeInfoDbData_1.2.3 slam_0.1-48           
 [6] numDeriv_2016.8-1.1    pillar_1.6.1           RSQLite_2.2.7          lattice_0.20-44        glue_1.4.2            
[11] bbmle_1.0.23.1         XVector_0.28.0         colorspace_2.0-1       Matrix_1.3-4           plyr_1.8.6            
[16] XML_3.99-0.6           pkgconfig_2.0.3        emdbook_1.3.12         genefilter_1.70.0      zlibbioc_1.34.0       
[21] purrr_0.3.4            xtable_1.8-4           mvtnorm_1.1-1          fdrtool_1.2.16         BiocParallel_1.22.0   
[26] tibble_3.1.2           annotate_1.66.0        generics_0.1.0         ellipsis_0.3.2         cachem_1.0.5          
[31] withr_2.4.2            survival_3.2-11        magrittr_2.0.1         crayon_1.4.1           memoise_2.0.0         
[36] fansi_0.5.0            MASS_7.3-54            tools_4.0.3            lifecycle_1.0.0        munsell_0.5.0         
[41] locfit_1.5-9.4         AnnotationDbi_1.50.3   compiler_4.0.3         rlang_0.4.11           grid_4.0.3            
[46] RCurl_1.98-1.3         bitops_1.0-7           gtable_0.3.0           DBI_1.1.1              R6_2.5.0              
[51] dplyr_1.0.6            bdsmatrix_1.3-4        fastmap_1.1.0          bit_4.0.4              utf8_1.2.1            
[56] lpsymphony_1.16.0      Rcpp_1.0.6             vctrs_0.3.8            geneplotter_1.66.0     tidyselect_1.1.1      
[61] coda_0.19-4

Thanks in advance for your help !

DESeq2 RNASeq BatchEffect limma • 101 views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 16 hours ago
United States

This is a consequence of the method of batch removal using the design.

ADD COMMENT
0
Entering edit mode

Alright, so nothing to worry about, then, I guess ?

Thank you very much for your answer !

ADD REPLY

Login before adding your answer.

Traffic: 272 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6