I have a complex RNASeq data with 3 factors, genotype time and treatment type. Top part of the target file looks like,
Sample Geno Time Treatment Sampl1 KK 2hr Control Sampl2 CR 2hr Control Sampl3 KK 2hr Control Sampl4 CR 2hr Control Sampl5 KK 2hr Stress1 Sampl6 CR 2hr Stress1 Sampl7 KK 4hr Stress1 Sampl8 CR 4hr Stress1 Sampl9 KK 4hr Stress2 Sampl10 CR 4hr Stress2 Sampl11 KK 4hr Stress2 Sampl12 CR 4hr Stress2 ....
I analyzed data with edgeR.Grouped factors and estimated DEGs using makeContrasts between groups,
group<-factor(paste(targets$Geno,targets$Time,targets$Treatment,sep=".")) cbind(targets,Group=group) y<-DGEList(counts = raw_counts, group = group) keep <-rowSums(cpm(y)>=2) >=2 y <- y[keep, , keep.lib.sizes=FALSE] y<-calcNormFactors(y) design<-model.matrix(~0+group) colnames(design)<-levels(group) y<-estimateDisp(y,design) fit<-glmQLFit(y,design) Normalized_counts<-cpm(y)
Now I added another batch of data for 6hr and I want to include the batch effect in the analysis. After reading similar posts, I think one possible way would be to add a new 'batch' column to target file and address that in the design matrix. Like,
Sample Geno Time Treatment Batch Sampl1 KK 2hr Control B1 Sampl2 CR 2hr Control B1 . Sampl60 CR 2hr Control B2
And include the batch in the model and proceed as shown above,
group<-factor(paste(targets$Geno,targets$Time,targets$Treatment, targets$Batch,sep=".")) design<-model.matrix(~0+group+Batch)
Is that the correct method to adopt? Please note that I will make a pairwise comparison for DE analysis all those comparisons are within a time point. ie, I don't have plan to compare for ex, 2hr control vs 6hr stress.