Batch correction and Deseq2
Entering edit mode
suphamon.j ▴ 20
Last seen 5 months ago

Hi, I would like to ask that the deta input fort Deseq2 is required a count metric. However, after I look into my count data, I would that there is a batch effect in my count data. As from you see in the image, the image shown the total read count of the data batch 2 (high read count in both red and blue color) and batch 1 (low read count in both red and blue color). Red color represented control sample and blue color represented the treatment.

How should I normalized my read count data before applying in Deseq2 and for WGCNA analysis. Currently I have tried using Combatseq on raw read count before using in Deseq2 analysis. What is the best option in my analysis, (1) Performing batch normalization using Combatseq --> Deseq2 --> VST transformation --> WGCNA (2) DEseq2 --> VST transformation --> WGCNA

Thank you very much for your kind in advance for suggestions and comments. Code should be placed in three backticks as shown below

#define conditions, library methods, and replicates
conditions = c("N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N","N", "N", "N", "N", "N", "N", "N","P","P","P", "P", "P", "P","P", "P", "P","P","P","P","P", "P", "P", "P","P", "P", "P","P","P","P","P", "P", "P", "P","P", "P", "P","P","P","P","P", "P", "P", "P","P", "P", "P","P","P","P","P", "P", "P", "P","P", "P", "P","P")
library_methods = c("D2", "D2","D2", "D2","D2", "D2","D2", "D2","D2", "D2","D2", "D2","D2", "D2","D2", "D2","D2", "D2","D2", "D2","D2", "D2","D2", "D2","D2", "D1", "D1","D1", "D1","D1", "D1","D1", "D1","D1", "D1","D1", "D1","D2", "D2","D2", "D2","D2", "D2", "D2","D2", "D2","D2", "D2", "D2","D2", "D2","D2", "D2", "D2","D2", "D2","D2", "D2", "D2","D2", "D2","D2", "D2",  "D1","D1", "D1","D1", "D1", "D1","D1", "D1","D1", "D1", "D1","D1", "D1","D1", "D1", "D1","D1", "D1","D1", "D1", "D1","D1", "D1","D1" )
replicates = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,26,1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)

#perform the batch correction

#first we need to transform the format of our groups and batches from names (e.g. "UHR", "HBR", etc.) to numbers (e.g. 1, 2, etc.)
#in the command below "sapply" is used to apply the "switch" command to each element and convert names to numbers as we define
groups = sapply(as.character(conditions), switch, "N" = 1, "P" = 2, USE.NAMES = F)
batches = sapply(as.character(library_methods), switch, "D1" = 1, "D2" = 2, USE.NAMES = F)
sample_names = names(uncorrected_data)[2:length(names(uncorrected_data))]
#now run ComBat_seq
corrected_data = ComBat_seq(counts = as.matrix(uncorrected_data[,sample_names]), batch = batches, group = groups)

#join the gene and chromosome names onto the now corrected counts from ComBat_seq
#corrected_data = cbind(uncorrected_data[,c("gene", "chr")], corrected_data)
corrected_data = cbind(uncorrected_data[,c("gene")], corrected_data)

# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )
DESeq2 WGCNA Normalization NormalyzerDE BatchEffect • 581 views
Entering edit mode
ATpoint ★ 3.6k
Last seen 10 hours ago

If there is good evidence for batch effects then it makes sense to remove that. ComBat-seq is one option indeed. PCA is a standard method to diagnose BEs, see vignette.

Entering edit mode

Hi, thank you very much for your answer.

As I have seen some articles about normalization, I have a confusion a little bit about the different between quartile normalization and the normalization in Deseq2.

Regarding to my result of total count, we would observe that there are differences in total count in each sample. from the figure in the post above, we can see that the batch 2 shown slightly high in total count value.

In this case, how should I normalize all the read count before differential gene expression analysis and WGCNA?

Entering edit mode

Use the default normalization in DESeq2, nothing special here.


Login before adding your answer.

Traffic: 805 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6