Hi all,
I have a question about the model to be defined for my the design in my EdgeR analysis..
My experiment design includes
9 patients :
one sample per condition per patient i.e. Two sample per patient 'C1D1' & 'EOT'
RNASeq data was generated in 2 batches ('BATCH1' & 'BATCH2'). Every patient has both sample generated with a single batch (i.e. either BATCH1 or BATCH2)
Our main question is to identify genes changing across conditions C1D1 and EOT for the nine patients.
My data.frame looks as below:
meta <- data.frame(
row.names=colnames(counttable),
condition=c("C1D1", "C1D1", "C1D1", "C1D1", "C1D1", "C1D1", "C1D1", "C1D1", "C1D1","EOT","EOT","EOT","EOT","EOT","EOT","EOT","EOT","EOT"),
libType=c("Batch1","Batch2","Batch2","Batch2","Batch1","Batch2","Batch2","Batch1","Batch2","Batch1","Batch1","Batch1","Batch2","Batch2","Batch2","Batch2","Batch2","Batch2"),
patient=c('AW','DD','DG','EC','EL','GR','LA','NR','RL','AW','NR','EL','DD','DG','EC','GR','LA','RL'))
- I could see a prominent batch effect using the plotMDS function which got resolved to much extent (but not completely) using the "logCPMc <- removeBatchEffect(logCPM, libType)"
I assume that using a design matrix such as edesign <- model.matrix(~libType+condition) should help me take care of batch effect. Can someone confirm if I am defining the matrix correctly?
My final design matrix includes the 'patient' parameter to compensate for the patient-patient variation!!
edesign <- model.matrix(~libType+patient+condition)
Again not sure if this is the correct way!!
edesign
(Intercept) libTypeBatch2 patientDD patientDG patientEC patientEL patientGR patientLA patientNR patientRL conditionEOT
1 1 0 0 0 0 0 0 0 0 0 0
2 1 1 1 0 0 0 0 0 0 0 0
3 1 1 0 1 0 0 0 0 0 0 0
4 1 1 0 0 1 0 0 0 0 0 0
5 1 0 0 0 0 1 0 0 0 0 0
6 1 1 0 0 0 0 1 0 0 0 0
7 1 1 0 0 0 0 0 1 0 0 0
8 1 0 0 0 0 0 0 0 1 0 0
9 1 1 0 0 0 0 0 0 0 1 0
10 1 0 0 0 0 0 0 0 0 0 1
11 1 0 0 0 0 0 0 0 1 0 1
12 1 0 0 0 0 1 0 0 0 0 1
13 1 1 1 0 0 0 0 0 0 0 1
14 1 1 0 1 0 0 0 0 0 0 1
15 1 1 0 0 1 0 0 0 0 0 1
16 1 1 0 0 0 0 1 0 0 0 1
17 1 1 0 0 0 0 0 1 0 0 1
18 1 1 0 0 0 0 0 0 0 1 1
However I get an error
e <- estimateGLMCommonDisp(e, edesign)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, :
Design matrix not of full rank. The following coefficients not estimable:
patientRL
Can someone put more light on this error and/or how can I do the analysis differently?
Appreciate any feedback,
regards,
Nandan