Hi Ryan and the limma team
I am working on 450K methylation data. We have two groups A and B. Each group has two time points (pre and post intervention). At present I would just like to analyse within group A.
My data looks like this:
Subject ID | Group | Time point (pre/post) | lymphocyte % | monocyte % | granulocyte % |
1 | A | Pre | lymp_x1 | mo_x1 | gr_x1 |
1 | A | Post | lymp_x2 | mo_x2 | gr_x2 |
2 | A | Pre | lymp_x3 | mo_x3 | gr_x3 |
2 | A | Post | lymp_x4 | mo_x4 | gr_x4 |
3 | A | Pre | lymp_x5 | mo_x5 | gr_x5 |
3 | A | Post | lymp_x6 | mo_x6 | gr_x6 |
4 | B | Pre | lymp_x7 | mo_x7 | gr_x7 |
4 | B | Post | lymp_x8 | mo_x8 | gr_x8 |
5 | B | Pre | lymp_x9 | mo_x9 | gr_x9 |
5 | B | Post | lymp_x10 | mo_x10 | gr_x10 |
6 | B | Pre | lymp_x11 | mo_x11 | gr_x11 |
6 | B | Post | lymp_x12 | mo_x12 | gr_x12 |
All the “a-f” and “x1-x12” are variables.
My data set is a paired. As most recommended I use the sibship design matrix for the study. I am following page 42 section 9.4.1 in the limma manual.
However I need to adjust for the blood count information which are my continuous covariates (ly , mo and gr).
I have read a few links in the Bioconductor forum (link: Removing continuous covariate effects in limma analysis).
I would like to adjust my data for these covariates and have attempted my code as follows:
First script is normal sibship without adjustment for cell count (fig 2 part 1)
Second script is adjustment for cell count ( fig2 part 2)
Script 1
#########################################################################################
## 1. Normal SibShip WITHOUT covariate adjustment
#########################################################################################
library("lumi")
library("limma")
Beta <- read.table("Beta.csv",head=TRUE,sep=",",check.names=FALSE)
Beta <- Beta [-1]## remove column 1 with probe ID
Mval <- beta2m(Beta) ## convert beta to M values
Target <- read.table("Target.2.csv",head=TRUE,sep=",",check.names=FALSE)
SibShip <- factor(Target$SibShip)
Treat <- factor(Target$Treatment, levels=c("C","T"))
design <- model.matrix(~SibShip+Treatment)
## with M values
fit.M <- lmFit(Mval, design)
fit.M <- eBayes(fit.M)
topTable(fit.M, coef="TreatT")
Script 2
#########################################################################################
## 2. Adjustment for blood count -ly mo and gr
#########################################################################################
library("lumi")
library("limma")
Beta <- read.table("Beta.csv",head=TRUE,sep=",",check.names=FALSE)
Beta <- Beta [-1]## remove column 1 with probe ID
Mval <- beta2m(Beta) ## convert beta to M values
Target <- read.table("Target.2.csv",head=TRUE,sep=",",check.names=FALSE)
SibShip <- factor(Target$SibShip)
Treat <- factor(Target$Treatment, levels=c("C","T"))
design <- model.matrix(~SibShip+Treat)
ly <- factor (Target$ly)
mo <- factor (Target$mo)
gr <- factor (Target$gr)
design.1 <- model.matrix(~SibShip+Treat+ly+mo,Target)
fit.cov <- lmFit(Mval,design.1)
fit.cov <- eBayes(fit.cov)
topTable(fit.cov, coef="TreatT")
topTable(fit.cov, adjust="BH")
My input files -- My target file looks like this:
fig 2
FileName | SibShip | Treatment | Sample_Group | cov1 | cov2 | cov3 |
File 1 | 26 | C | Pre | x1 | y1 | z1 |
File 2 | 26 | T | Post | x2 | y2 | z2 |
File 3 | 28 | C | Pre | x3 | y3 | z3 |
File 4 | 28 | T | Post | x4 | y4 | z4 |
File 5 | 32 | C | Pre | x5 | y5 | z5 |
File 6 | 32 | T | Post | x6 | y6 | z6 |
File 7 | 34 | C | Pre | x7 | y7 | z7 |
File 8 | 34 | T | Post | x8 | y8 | z8 |
File 9 | 36 | C | Pre | x9 | y9 | z9 |
File 10 | 36 | T | Post | x10 | y10 | z10 |
And my beta file looks like this:
fig 3
File 1 | File 2 | File 3 | File 4 | File 5 | File 6 | File 7 | File 8 | |
cg13869341 | 0.81435 | 0.86286 | 0.90742 | 0.880754 | 0.877508 | 0.841507 | 0.714104 | 0.877911 |
cg15560884 | 0.772191 | 0.787651 | 0.761191 | 0.799986 | 0.747465 | 0.752523 | 0.776277 | 0.801894 |
cg01014490 | 0.045361 | 0.077699 | 0.058017 | 0.033827 | 0.038075 | 0.055925 | 0.031437 | 0.057016 |
cg17505339 | 0.913544 | 0.929492 | 0.941861 | 0.917624 | 0.901202 | 0.909436 | 0.938309 | 0.918753 |
cg16736630 | 0.545144 | 0.517417 | 0.462861 | 0.438669 | 0.44927 | 0.555143 | 0.648387 | 0.531261 |
cg03128332 | 0.254491 | 0.229242 | 0.32559 | 0.211406 | 0.303473 | 0.282476 | 0.159184 | 0.236487 |
cg18147296 | 0.892314 | 0.891493 | 0.882442 | 0.8969 | 0.884726 | 0.860286 | 0.876255 | 0.924667 |
cg05597748 | 0.775692 | 0.786959 | 0.815256 | 0.814986 | 0.809486 | 0.816575 | 0.805664 | 0.845997 |
- I need to compare pre vs post so do I need to make a constrast matrix or the 1st script ok.
cont.matrix <- makeContrasts(prevspost=post-pre,levels=design)
- Some warnings were generated in the 2nd script ( figure 1 below)
- Is my 2nd script correct or do I need to use the removeBatchEffect()function.
- My objective is to adjust for blood count and proceed to do a differential analysis between pre and post for group A.
- Then do the same for group B.
- Finally use the limma manual page 48 section 9.7 Multi- level experiments.
Figure 5 .Warning on running script 2
http://tinypic.com?ref=2qst1c8" target="_blank">http://i64.tinypic.com/2qst1c8.jpg" border="0" alt="Image and video hosting by TinyPic">
Thank you for your attention and hope to hear from you.
Regards
smeeta
No, you don't need to make a contrast matrix, because you can drop the coefficient directly.
As for your responses to the general comments:
3. Only if the cell proportions are the same before and after treatment.
4. No, the use of M-values has benefits elsewhere, e.g., normally distributed for linear modelling, a more stable mean-variance relationship. I would stick with that rather than using beta values.
Finally, if you want to know the effect of including the cell proportions, I would try doing the limma analysis with and without them in the model, and seeing how many DE genes you get. If the proportion terms are helpful at removing confounding effects, they should increase the number of DE genes by reducing the unmodelled variance. Otherwise, if the number of DE genes doesn't increase (or decreases because the cell proportions absorb the DE), you shouldn't include them.
If you only have two time points, I would treat it as a paired sample design with early/late times as factors, rather than using a time course design with real-valued time as a covariate.