Analyzing paired pre/post data using limma and adjusting for continuous covariates
3
1
Entering edit mode
@smeeta-shrestha-6393
Last seen 8.8 years ago

Hi Ryan and the limma team

I am working on 450K methylation data. We have two groups A and B. Each group has two time points (pre and post intervention). At present I would just like to analyse within group A.

My data looks like this:

Subject ID Group Time point (pre/post) lymphocyte % monocyte % granulocyte %
 
1 A Pre lymp_x1 mo_x1 gr_x1
1 A Post lymp_x2 mo_x2 gr_x2
2 A Pre lymp_x3 mo_x3 gr_x3
2 A Post lymp_x4 mo_x4 gr_x4
3 A Pre lymp_x5 mo_x5 gr_x5
3 A Post lymp_x6 mo_x6 gr_x6
4 B Pre lymp_x7 mo_x7 gr_x7
4 B Post lymp_x8 mo_x8 gr_x8
5 B Pre lymp_x9 mo_x9 gr_x9
5 B Post lymp_x10 mo_x10 gr_x10
6 B Pre lymp_x11 mo_x11 gr_x11
6 B Post lymp_x12 mo_x12 gr_x12

All the “a-f” and “x1-x12” are variables.

My data set is a paired. As most recommended I use the sibship design matrix for the study. I am following page 42 section 9.4.1 in the limma manual.

However I need to adjust for the blood count information which are my continuous covariates (ly , mo and gr).

I have read a few links in the Bioconductor forum (link: Removing continuous covariate effects in limma analysis).

I would like to adjust my data for these covariates and have attempted my code as follows:

First script is normal sibship without adjustment for cell count (fig 2 part 1)

Second script is adjustment for cell count ( fig2 part 2)

 

Script 1

#########################################################################################
## 1. Normal SibShip WITHOUT covariate adjustment
#########################################################################################
library("lumi")
library("limma")
Beta <- read.table("Beta.csv",head=TRUE,sep=",",check.names=FALSE)
Beta <- Beta [-1]## remove column 1 with probe ID
Mval <- beta2m(Beta)  ## convert beta to M values
Target <- read.table("Target.2.csv",head=TRUE,sep=",",check.names=FALSE)
SibShip <- factor(Target$SibShip)
Treat <- factor(Target$Treatment, levels=c("C","T"))
design <- model.matrix(~SibShip+Treatment)
## with M values
fit.M <- lmFit(Mval, design)
fit.M <- eBayes(fit.M)
topTable(fit.M, coef="TreatT")


Script 2

#########################################################################################
## 2. Adjustment for blood count -ly mo and gr
#########################################################################################
library("lumi")
library("limma")
Beta <- read.table("Beta.csv",head=TRUE,sep=",",check.names=FALSE)
Beta <- Beta [-1]## remove column 1 with probe ID
Mval <- beta2m(Beta)  ## convert beta to M values
Target <- read.table("Target.2.csv",head=TRUE,sep=",",check.names=FALSE)
SibShip <- factor(Target$SibShip)
Treat <- factor(Target$Treatment, levels=c("C","T"))
design <- model.matrix(~SibShip+Treat)
ly <- factor (Target$ly)
mo <- factor (Target$mo)
gr <- factor (Target$gr) 
design.1 <- model.matrix(~SibShip+Treat+ly+mo,Target)
fit.cov <- lmFit(Mval,design.1)
fit.cov  <- eBayes(fit.cov)
topTable(fit.cov, coef="TreatT")
topTable(fit.cov, adjust="BH")


My input files  -- My target file looks like this:

fig 2

FileName SibShip Treatment Sample_Group cov1 cov2 cov3
File 1 26 C Pre x1 y1 z1
File 2 26 T Post x2 y2 z2
File 3 28 C Pre x3 y3 z3
File 4 28 T Post x4 y4 z4
File 5 32 C Pre x5 y5 z5
File 6 32 T Post x6 y6 z6
File 7 34 C Pre x7 y7 z7
File 8 34 T Post x8 y8 z8
File 9 36 C Pre x9 y9 z9
File 10 36 T Post x10 y10 z10

And my beta file looks like this:

fig 3

  File 1 File 2 File 3 File 4 File 5 File 6 File 7 File 8
cg13869341 0.81435 0.86286 0.90742 0.880754 0.877508 0.841507 0.714104 0.877911
cg15560884 0.772191 0.787651 0.761191 0.799986 0.747465 0.752523 0.776277 0.801894
cg01014490 0.045361 0.077699 0.058017 0.033827 0.038075 0.055925 0.031437 0.057016
cg17505339 0.913544 0.929492 0.941861 0.917624 0.901202 0.909436 0.938309 0.918753
cg16736630 0.545144 0.517417 0.462861 0.438669 0.44927 0.555143 0.648387 0.531261
cg03128332 0.254491 0.229242 0.32559 0.211406 0.303473 0.282476 0.159184 0.236487
cg18147296 0.892314 0.891493 0.882442 0.8969 0.884726 0.860286 0.876255 0.924667
cg05597748 0.775692 0.786959 0.815256 0.814986 0.809486 0.816575 0.805664 0.845997

 

  1. I need to compare pre vs post so do I need to make a constrast matrix or the 1st script ok.

cont.matrix <- makeContrasts(prevspost=post-pre,levels=design)

  1. Some warnings were generated in the 2nd script ( figure 1 below)
  2. Is my 2nd script correct or do I need to use the removeBatchEffect()function.
  3. My objective is to adjust for blood count and proceed to do a differential analysis between pre and post for group A.
  4. Then do the same for group B.
  5. Finally use the limma manual page 48 section 9.7 Multi- level experiments.

Figure 5 .Warning on running script 2

http://tinypic.com?ref=2qst1c8" target="_blank">http://i64.tinypic.com/2qst1c8.jpg" border="0" alt="Image and video hosting by TinyPic">

 

Thank you for your attention and hope to hear from you.

Regards

smeeta

 

 

 

 

 

 

 

 

limma pre/post continous covariate • 2.8k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 2 hours ago
The city by the bay

For your first question, you haven't put Sample_Group in your design matrix, so the cont.matrix you've constructed doesn't make any sense. In fact, you can't put Sample_Group in your design matrix because it's identical to Treatment, so one would be redundant with the other. Rather, you should be comparing the levels C and T in Treatment, by dropping TreatT as you've done in your first script.

For your second question, the code in your supplied image shows design.1 is constructed as:

design.1 <- model.matrix(~SibShip+Treat+ly+mo+gr,Target)

... but the gr is missing in the code you've posted above. In the image, the warning is because the sum of the cell proportions is equal to 1 for each sample. This results in linear dependencies because the addition of all the SibShip blocking factors will also be equal to 1. There will be an infinite number of solutions by swapping values between coefficients corresponding to SibShip and the cell proportions. Hence, one of the proportion terms is dropped to ensure that everything is estimable.

Your remaining points aren't really questions, so I'll just give some general comments. The first is that you better make sure that the proportions are not correlated with the treatment, i.e., you don't get consistently more or less of one cell type after treatment across all samples. If this occurs, the terms corresponding to the cell proportions may absorb the effect of interest. Conversely, if the cell proportions are the same before/after treatment, then they aren't really necessary as they will be redundant with SibShip.

In addition, your model assumes that you have a linear response between the cell proportion and the observations. This may not be entirely accurate when your observations are M-values that are logit-transformed. For example, if your lymphocytes were heavily methylated (and all other cell types were non-methylated), and you had twice as many of them in one sample compared to another (and all other things being equal), you might expect a doubling of the methylation level, rather than a doubling of the M-values. Normally, this wouldn't be a problem because I'd suggest you use splines that can accommodate arbitrary trends with respect to the covariates; however, this isn't practical here because you've got too many covariates and not enough samples. I don't really see an alternative model that can overcome this, so assuming a linear response is probably better than nothing at all if the cell proportion really has an effect.

ADD COMMENT
0
Entering edit mode
@smeeta-shrestha-6393
Last seen 8.8 years ago

Hi Aaron,

Thank you so much for the reply.

I would like to discuss the points which you have brought out.

 

  1. Well for the first point brought up. I agree that “Treatment” is same as Sample_Group. So my question there was do I just proceed as per the script 1 or do I need to add a contrast. My objective is just to find differentially methylated CpGs in pre (c) compared to post (T) for Group A. Do I need to make a constrast matrix ?

cont.matrix <- makeContrasts(CvsT=T-C,levels=design) or will it be redundant to the script 1.

  1. Yes there was an error from my end. My syntax needed to include the “gr” as well.

 

The general comments

 

  1. The first is that you better make sure that the proportions are not correlated with the treatment, i.e., you don't get consistently more or less of one cell type after treatment across all samples.  I have checked the data, Some individuals do show change of proportion with treatment, albeit not significant. I think we should go for point 5- same cell count before/after treatment.
  2. If this occurs, the terms corresponding to the cell proportions may absorb the effect of interest. Yes, agreed but I need to negate/adjust this change so that I get the change in methylation only from the treatment effect and not contributed by the cell count effect.
  3. Conversely, if the cell proportions are the same before/after treatment, then they aren't really necessary as they will be redundant with SibShip.. So do I understand you correctly that if I use the sibship design, I am already adjusting for the other factors (cell count)
  4. In addition, your model assumes that you have a linear response between the cell proportion and the observations. This may not be entirely accurate when your observations are M-values that are logit-transformed. For example, if your lymphocytes were heavily methylated (and all other cell types were non-methylated), and you had twice as many of them in one sample compared to another (and all other things being equal), you might expect a doubling of the methylation level, rather than a doubling of the M-values. Yes I do agree with the explanation, so would it be better if I used the beta values directly and not the M values ?

Finally

  1. Is there anything wrong with the syntax for script 1 and 2 one for without cell count adjustment (script 1) and one for the cell count adjustment (script2).
  2. I am new to R and it is taking me time to understand the logic beside the different syntax used in limma model design. With the available information online and the limma manual I attempted to address a simple question ie how can I obtain the list of differentially methylated regions between Pre and post in a single group A with and without adjusting for continues cell count covariates. I hope the method is correct.
  3. Or should I do a linear regression to check if there is any significant  linear relationship between each covariate (ly,mo and gr) separately with my pre/post methylation data set and IF and ONLY IF I get a significant association of the cell count with the methylation data then I do the adjustment.
  4. Also if I understand your input at point #5. The sibship design takes all these factors into consideration when generating the model.

 

Last doubt:

 

Is sibship model applicable for both of these expt .

 

Expt Case 1. Same individual but different tissue- tumor and Normal.

 

Expt Case 2. Same individual data at time point 1 and data at time point 2 (1 yr interval).

 

Mine is the second case, so I was wondering is the sibship from 9.4.1 Paired Samples pg 42 the correct model design or should I be using the section 9.6 Time Course Experiments pg 46

 

Thank you so much for your attention and hope to hear soon.

 

Smeeta

 

 

 

 

 

 

 

ADD COMMENT
0
Entering edit mode

No, you don't need to make a contrast matrix, because you can drop the coefficient directly.

As for your responses to the general comments:

3. Only if the cell proportions are the same before and after treatment.

4. No, the use of M-values has benefits elsewhere, e.g., normally distributed for linear modelling, a more stable mean-variance relationship. I would stick with that rather than using beta values.

Finally, if you want to know the effect of including the cell proportions, I would try doing the limma analysis with and without them in the model, and seeing how many DE genes you get. If the proportion terms are helpful at removing confounding effects, they should increase the number of DE genes by reducing the unmodelled variance. Otherwise, if the number of DE genes doesn't increase (or decreases because the cell proportions absorb the DE), you shouldn't include them.

If you only have two time points, I would treat it as a paired sample design with early/late times as factors, rather than using a time course design with real-valued time as a covariate.

ADD REPLY
0
Entering edit mode
@smeeta-shrestha-6393
Last seen 8.8 years ago

Hi Aaron,

Thank you for your response. 

I will look into this.

regards

smeeta

ADD COMMENT

Login before adding your answer.

Traffic: 569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6