Dear BioC people,
I am analyzing a wet lab experiment on a cell line with 5 biological replicates. Each biological replicate has 3 or 4 technical replicates. Here is my targets$BiolRep entry
targets$BiolRep [1] 1 2 2 1 2 1 4 4 1 4 3 3 3 5 5 5
Some of the biological replicates are sensitive and some are resistant to a drug. The purpose of the experiment is to find the differentially expressed genes between the two conditions. My targets$Response entry is
targets$Response [1] "sensitive" "sensitive" "sensitive" "sensitive" "sensitive" "sensitive" "sensitive" "sensitive" "sensitive" [10] "sensitive" "resistant" "resistant" "resistant" "resistant" "resistant" "resistant"
To account for technical and biological replication, I use limma's duplicateCorrelation function. Here is part of my script, where norm.exp is my normalized expression matrix.
ct<-factor(targets$Response) design <- model.matrix(~ct) colnames(design) <- c(levels(ct)) arrayw <- arrayWeights(norm.exp, design) w <- asMatrixWeights(arrayw, dim(norm.exp)) dupcor <- duplicateCorrelation(norm.exp, design, block=targets$BiolRep, weights=w)
I am wondering why the dupcor$consensus.correlation is as low as 0.797633. When I calculate the Pearson correlation between the technical replicates of each biological replicate, this is above 0.99.
Thank you very much for your help!
Eleni
Thank you, Aaron, for the nice illustration of the differences between the methods! Really insightful!
At the end, do you maybe mean aveArrays() and not avereps()? My aim is to average the technical replicates of the samples.
I also have another case with just a pair of samples: One before and one after treatment. Each one has 3 technical replicates. Do you suggest to use limma with averrays(), or a paired t-test would be better?
Thank you very much!
Yes,
avearrays
is the appropriate function here. Your other question should go in a separate post; but if you only have one sample for each condition, you can't really do much, regardless of how many technical replicates you have.Thank you very much!
Dear Aaron,
I tried avearrays() for a problem similar to the above. I then want to do differential gene detection using limma. The problem is that some of the replicates are in different batches. I thus want to add the batch in my design matrix. Is this possible when the averaged replicates are across batches? Maybe, in this case, avearrays is not the best way to go and duplicateCorrelation() is the way?
Thank you so much.
First, is there actually a batch effect? Check out a MDS plot, and if there's no real effect, then you don't need to model the batch factor and you can average the arrays as described. Otherwise, if different replicates are in different batches, and there's a different number of replicates from each sample in each batch, then you'll probably have to use
duplicateCorrelation
.Hello, Aaron and apology for the late reply. Thank you for your response. Yes, there is a batch effect which is evident with MDS. I will then apply duplicateCorrelation in my second problem.
Best,
Eleni