Question

Considering both : batch effect and patient to patient variation in EdgeR analysis

0

Entering edit mode

n.deshpande • 0

@ndeshpande-8759

Last seen 8.6 years ago

Australia

Hi all,

I have a question about the model to be defined for my the design in my EdgeR analysis..

My experiment design includes

9 patients :

one sample per condition per patient i.e. Two sample per patient 'C1D1' & 'EOT'

RNASeq data was generated in 2 batches ('BATCH1' & 'BATCH2'). Every patient has both sample generated with a single batch (i.e. either BATCH1 or BATCH2)

Our main question is to identify genes changing across conditions C1D1 and EOT for the nine patients.

My data.frame looks as below:

meta <- data.frame(
row.names=colnames(counttable),
condition=c("C1D1", "C1D1", "C1D1", "C1D1", "C1D1", "C1D1", "C1D1", "C1D1", "C1D1","EOT","EOT","EOT","EOT","EOT","EOT","EOT","EOT","EOT"),
libType=c("Batch1","Batch2","Batch2","Batch2","Batch1","Batch2","Batch2","Batch1","Batch2","Batch1","Batch1","Batch1","Batch2","Batch2","Batch2","Batch2","Batch2","Batch2"),

patient=c('AW','DD','DG','EC','EL','GR','LA','NR','RL','AW','NR','EL','DD','DG','EC','GR','LA','RL'))

I could see a prominent batch effect using the plotMDS function which got resolved to much extent (but not completely) using the "logCPMc <- removeBatchEffect(logCPM, libType)"

I assume that using a design matrix such as edesign <- model.matrix(~libType+condition) should help me take care of batch effect. Can someone confirm if I am defining the matrix correctly?

My final design matrix includes the 'patient' parameter to compensate for the patient-patient variation!!

edesign <- model.matrix(~libType+patient+condition)

Again not sure if this is the correct way!!

edesign

(Intercept) libTypeBatch2 patientDD patientDG patientEC patientEL patientGR patientLA patientNR patientRL conditionEOT
1            1             0         0         0         0         0         0         0         0         0            0
2            1             1         1         0         0         0         0         0         0         0            0
3            1             1         0         1         0         0         0         0         0         0            0
4            1             1         0         0         1         0         0         0         0         0            0
5            1             0         0         0         0         1         0         0         0         0            0
6            1             1         0         0         0         0         1         0         0         0            0
7            1             1         0         0         0         0         0         1         0         0            0
8            1             0         0         0         0         0         0         0         1         0            0
9            1             1         0         0         0         0         0         0         0         1            0
10           1             0         0         0         0         0         0         0         0         0            1
11           1             0         0         0         0         0         0         0         1         0            1
12           1             0         0         0         0         1         0         0         0         0            1
13           1             1         1         0         0         0         0         0         0         0            1
14           1             1         0         1         0         0         0         0         0         0            1
15           1             1         0         0         1         0         0         0         0         0            1
16           1             1         0         0         0         0         1         0         0         0            1
17           1             1         0         0         0         0         0         1         0         0            1
18           1             1         0         0         0         0         0         0         0         1            1

However I get an error

e <- estimateGLMCommonDisp(e, edesign)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, :
Design matrix not of full rank. The following coefficients not estimable:
patientRL

Can someone put more light on this error and/or how can I do the analysis differently?

Appreciate any feedback,

regards,

Nandan

EdgeR batch effect patient rnaseq • 1.5k views

ADD COMMENT • link 8.6 years ago n.deshpande • 0

score 0 · Answer 1 · 2015-09-10

There's no point putting in the libType factor, as the batch effect is fully absorbed into the patient-specific blocking factors. Consider an example gene where all Batch2 samples have a 2-fold increase in expression. You don't need a specific coefficient to account for this batch effect, as the 2-fold increase will be absorbed by the patient coefficients for all patients in the second batch. In summary, use:

design <- model.matrix(~patient+condition)

This should avoid the error in estimateGLMCommonDisp. Your previous matrix wasn't of full rank because the libType coefficient was redundant with the patient coefficients, for reasons described above.

score 0 · Answer 2 · 2015-09-10

0