Question

regression analysis for epic array methylation data using limma

0

Entering edit mode

Jitendra ▴ 10

@nabiyogesh-11718

Last seen 6 months ago

United Kingdom

I am trying to do a regression analysis to find an association between eGFR and methylation data, but I am not sure how to make a model for eGFR and epigenetic data while using Age, Gender, and immune cell as covariate ( CD8T ,CD4T, NK, Bcell and Mono) and how I can filter significant association?

> library(limma)
#phenotype data
> targets<-read.table("Sample.Info.regression.txt", header=TRUE, stringsAsFactors=FALSE)

> targets$logeGFR=log(targets$eGFR)
> head(targets)
     V1 Sentrix_ID Sentrix_Position    Batch  Category Gender
1 DC541   2.03e+11           R06C01 Batch 15       Control   Male
2 DC485   2.03e+11           R04C01  Batch 8       Control   Male
3 DC490   2.03e+11           R08C01  Batch 8       Control   Male
4 DC131   2.03e+11           R08C01  Batch 7      Control Female
5 DC574   2.03e+11           R03C01 Batch 16      Control Female
6 DC411   2.03e+11           R02C01 Batch 18       Control   Male
      eGFR Age   RRT       CD8T       CD4T         NK       Bcell       Mono
1 141.3943  29 FALSE 0.09559439 0.22490564 0.00000000 0.087392632 0.03744009
2 133.6376  42 FALSE 0.04212238 0.12661890 0.04028556 0.024276752 0.09722832
3 133.1413  39 FALSE 0.03568210 0.15196063 0.03766905 0.031379030 0.07066799
4 131.9288  58 FALSE 0.10451144 0.04749711 0.03571969 0.004003534 0.06539260
5 130.6548  24 FALSE 0.07358490 0.17676019 0.07706284 0.023125340 0.08663073
6 127.5505  31 FALSE 0.01487569 0.11073860 0.01740834 0.065593039 0.05207325
       Gran  logeGFR
1 0.4991278 4.951553
2 0.6401493 4.895132
3 0.6507270 4.891411
4 0.7140364 4.882262
5 0.4966720 4.872559
6 0.7113963 4.848512
#methylation data
> betaval<-read.table("regression.new.count.txt", header=TRUE, stringsAsFactors=FALSE)
> head(betaval,10)[,1:5]
                    DC541      DC485       DC490     DC131
1  cg26928153 0.869022470 0.87365220 0.911936463 0.6590429
2  cg16269199 0.625793206 0.78426522 0.805430832 0.5539671
3  cg13869341 0.862986903 0.88750760 0.789204281 0.7218103
4  cg24669183 0.751976011 0.86801138 0.889076957 0.6721988
5  cg26679879 0.379529937 0.34471486 0.383734472 0.4459295
6  cg22519184 0.394488319 0.38111739 0.381072524 0.4119580
7  cg15560884 0.613115675 0.68987661 0.713836687 0.5976042
8  cg01014490 0.009082822 0.01586443 0.006567706 0.1453065
9  cg10692041 0.908627024 0.92327421 0.902507690 0.8106531
10 cg02339369 0.915374608 0.87147961 0.902443744 0.7101839

Many thanks, ```

limma Epigenetics STATISTICS R • 1.6k views

ADD COMMENT • link 2.6 years ago Jitendra ▴ 10

score 0 · Answer 1 · 2022-05-23

0

Entering edit mode

Basti ▴ 780

@7d45153c

Last seen 12 days ago

France

Which package do you intend to use ? There are many tutorials in packages vignette that show you how to perform differential methylation analysis. This one is very complete : https://bioconductor.org/packages/release/workflows/vignettes/methylationArrayAnalysis/inst/doc/methylationArrayAnalysis.html

ADD COMMENT • link 2.6 years ago Basti ▴ 780

0

Entering edit mode

Thanks!

I have tried this; I am not sure after that how to find logeGFR associated probe? and also, how to filter significant probe?

var<-model.matrix(~logeGFR + as.factor(Gender) +CD8T +CD4T +NK + Bcell +Mono ,data=targets) 
fit<-lmFit(bataval,var) 
fit2<-eBayes(fit,trend=TRUE robust=TRUE)

Many thanks,

ADD REPLY • link 2.6 years ago Jitendra ▴ 10

0

Entering edit mode

Your DMPs can be accessed with topTable function : DMPs <- topTable(fit2, coef=2), then you can filter the DMPs by adjusted P-val

ADD REPLY • link 2.6 years ago Basti ▴ 780

0

Entering edit mode

Thanks you so much Basti, could you please let me know how do we decide what coefficient number should we used? Why we are using coef=2 here?

Many thanks,

ADD REPLY • link 2.6 years ago Jitendra ▴ 10

0

Entering edit mode

The columns of your design matrix var corresponds to the coefficients that are fitted by limma. If you check your design matrix, you will see that the first column is the intercept term and the second should refer to logeGFR, your variable of interest

ADD REPLY • link 2.6 years ago Basti ▴ 780

0

Entering edit mode

Dear Basti, thank you so much!

Could you please also help me how to arrange samples in the same order in rows (targests2) and columns in (betaval2) as I am getting this error?

targets2 and betaval2 looks like this;

 head(betaval2, 10) [,1:5]
                    DC541      DC485       DC490     DC131
1  cg26928153 0.869022470 0.87365220 0.911936463 0.6590429
2  cg16269199 0.625793206 0.78426522 0.805430832 0.5539671
3  cg13869341 0.862986903 0.88750760 0.789204281 0.7218103
4  cg24669183 0.751976011 0.86801138 0.889076957 0.6721988
5  cg26679879 0.379529937 0.34471486 0.383734472 0.4459295
6  cg22519184 0.394488319 0.38111739 0.381072524 0.4119580
7  cg15560884 0.613115675 0.68987661 0.713836687 0.5976042
8  cg01014490 0.009082822 0.01586443 0.006567706 0.1453065
9  cg10692041 0.908627024 0.92327421 0.902507690 0.8106531
10 cg02339369 0.915374608 0.87147961 0.902443744 0.7101839

> head(targets2)
      Sentrix_ID Sentrix_Position    Batch Recruitment Category Gender     eGFR
DC541   2.03e+11           R06C01 Batch 15     Belfast  Control   Male 141.3943
DC485   2.03e+11           R04C01  Batch 8     Belfast  Control   Male 133.6376
DC490   2.03e+11           R08C01  Batch 8     Belfast  Control   Male 133.1413
DC131   2.03e+11           R08C01  Batch 7     Belfast  Control Female 131.9288
DC574   2.03e+11           R03C01 Batch 16     Belfast  Control Female 130.6548
DC411   2.03e+11           R02C01 Batch 18     Belfast  Control   Male 127.5505


> mval<-log2(betaval2/(1-betaval2))
> var<-model.matrix(~logeGFR + as.factor(Gender) + Age +CD8T +CD4T +NK + Bcell +Mono ,data=targets2)
 > fit<-lmFit(mval,var)
Error in lmFit(mval, var) :
  row dimension of design doesn't match column dimension of data object

ADD REPLY • link 2.6 years ago Jitendra ▴ 10