Questions in limma about multifactorial design and possible batch effect correction
1
0
Entering edit mode
svlachavas ▴ 830
@svlachavas-7225
Last seen 13 months ago
Germany/Heidelberg/German Cancer Resear…

 

Dear Bioconductor Community,

i have tried to use linear modeling in the below expression set for the variable "condition", which describes 5 levels of one control and 4 biological substances. in order to evaluate and test the possible changes in gene expression on a specific cell line between the "control" and the 4 substances. My code is below:

pData(normalized2)

                           condition        replicate
dataset_603.dat      Control1             1
dataset_604.dat      Control1             2
dataset_605.dat      Biological1         1
dataset_606.dat      Biological1         2
dataset_607.dat      Biological2         1
dataset_608.dat      Biological2         2
dataset_609.dat      Biological3         1
dataset_610.dat      Biological3         2
dataset_611.dat      Biological4         1
dataset_612.dat      Biological4         2

f <- factor(normalized2$condition, levels=c("Control1","Biological1","Biological2", "Biological3", "Biological4"))

design <- model.matrix(~0+f)

fit <- lmFit(normalized2, design)

contrast.matrix <- makeContrasts(fBiological1-fControl1, fBiological2-fControl1, fBiological3-fControl1, fBiological4-fControl1, levels=design)

fit2 <- contrasts.fit(fit, contrast.matrix)

fit2 <- eBayes(fit2)

Unfortunately, as anyone can see from the pData object, my dataset consists of two different batches: that is, each substance and the control is comprized of two biological replicates-and each batch is compized of 5 CEL files(which are characterized by number 1 and number 2 respectively(i.e  those CEL files samples that have replicate=2  have been preprossessed also together in a different time). Thus, my main consern is if and how could i use from limma the information of the batch that each CEL belongs(normalized2$replicate) in order to include it as a random effect in limma ? Or my approach is completely wrong and i should perform some batch effect correction ? and if so in which direction or which package should i use, because i have never performed batch effect correction ?

Thank you in advance

 

 

 

limma batcheffect batcheffectcorrection multiple factor design • 3.0k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States

You just add the batch to your design as a nuisance variable.

n2 <- pData(normalized2)
n2$Condition <- relevel(factor(n2$Condition), "Control1")
design <- model.matrix(~condition + replicate, n2)
fit <- lmFit(normalized2, design)
fit2 <- eBayes(fit)

Now the coefficients 2-5 compare each Biological vs Control after controlling for the batch effect.


 

ADD COMMENT
0
Entering edit mode

Dear Mr MacDonald,

in the above code do you mean 

n2$condition <- relevel(factor(n2$condition), "Control1") , because with the above it gives me an error:

Error in relevel.factor(factor(n2$Condition), "Control1") : 
  'ref' must be an existing level

ADD REPLY
0
Entering edit mode

Dear Mr MacDonald, one more naive but important question for me :

as i have performed before statistical significance a non-specific intensity filtering, should i also include the argument trend=TRUE in eBayes ? or it is no related to the possible batch effect that is present ? [I tried and it seems that the t statistic and p values change but not in a great degree]

 

 

 

 

ADD REPLY

Login before adding your answer.

Traffic: 624 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6