I am working with large panel of human tissue cells that are collected from different centres. Its a paired design i.e same sample has treated and untreated rna-seq data.
Here is how my colData looks like
> colData purity subjects treat Center Year BMI Gender Age sample1_treat 95 1 treat Miami 2011 <NA> M <NA> sample1_untreat 95 1 untreat Miami 2011 <NA> M <NA> sample2_treat 90 2 treat Milano 2011 <NA> F <NA> sample2_untreat 90 2 untreat Milano 2011 <NA> F <NA> sample3_treat 75 3 treat Geneva 2011 20 F 50 sample3_untreat 75 3 untreat Geneva 2011 20 F 50 . . . . . .
As I have been believing that the paired design takes any batch effects in to account, I used the following design for DE analysis to look for differential genes between
treated vs untreated :
design(dds) <- formula(~ subjects + treat)
I am also intended to do exploratory data analysis, for that I am using a matrix of log Fold change values generated by:
norm <- assay(normTransform(dds)) i <- seq.int(1L,72,by = 2L) # 72 is the total number of samples. norm.fc <- norm[,i]-norm[,i+1] write.table(norm.fc, file="normalised_FCMatrix.txt", sep="\t")
And I am using the
norm.fc for doing clustering and PCA analysis.
I would like to know if my design for DE analysis is correct. Do I need to incorporate any other information to account for tissue collection / sequencing centre etc.