I am trying to do a regression analysis to find an association between eGFR and methylation data, but I am not sure how to make a model for eGFR and epigenetic data while using Age, Gender, and immune cell as covariate ( CD8T ,CD4T, NK, Bcell and Mono) and how I can filter significant association?
> library(limma)
#phenotype data
> targets<-read.table("Sample.Info.regression.txt", header=TRUE, stringsAsFactors=FALSE)
> targets$logeGFR=log(targets$eGFR)
> head(targets)
V1 Sentrix_ID Sentrix_Position Batch Category Gender
1 DC541 2.03e+11 R06C01 Batch 15 Control Male
2 DC485 2.03e+11 R04C01 Batch 8 Control Male
3 DC490 2.03e+11 R08C01 Batch 8 Control Male
4 DC131 2.03e+11 R08C01 Batch 7 Control Female
5 DC574 2.03e+11 R03C01 Batch 16 Control Female
6 DC411 2.03e+11 R02C01 Batch 18 Control Male
eGFR Age RRT CD8T CD4T NK Bcell Mono
1 141.3943 29 FALSE 0.09559439 0.22490564 0.00000000 0.087392632 0.03744009
2 133.6376 42 FALSE 0.04212238 0.12661890 0.04028556 0.024276752 0.09722832
3 133.1413 39 FALSE 0.03568210 0.15196063 0.03766905 0.031379030 0.07066799
4 131.9288 58 FALSE 0.10451144 0.04749711 0.03571969 0.004003534 0.06539260
5 130.6548 24 FALSE 0.07358490 0.17676019 0.07706284 0.023125340 0.08663073
6 127.5505 31 FALSE 0.01487569 0.11073860 0.01740834 0.065593039 0.05207325
Gran logeGFR
1 0.4991278 4.951553
2 0.6401493 4.895132
3 0.6507270 4.891411
4 0.7140364 4.882262
5 0.4966720 4.872559
6 0.7113963 4.848512
#methylation data
> betaval<-read.table("regression.new.count.txt", header=TRUE, stringsAsFactors=FALSE)
> head(betaval,10)[,1:5]
DC541 DC485 DC490 DC131
1 cg26928153 0.869022470 0.87365220 0.911936463 0.6590429
2 cg16269199 0.625793206 0.78426522 0.805430832 0.5539671
3 cg13869341 0.862986903 0.88750760 0.789204281 0.7218103
4 cg24669183 0.751976011 0.86801138 0.889076957 0.6721988
5 cg26679879 0.379529937 0.34471486 0.383734472 0.4459295
6 cg22519184 0.394488319 0.38111739 0.381072524 0.4119580
7 cg15560884 0.613115675 0.68987661 0.713836687 0.5976042
8 cg01014490 0.009082822 0.01586443 0.006567706 0.1453065
9 cg10692041 0.908627024 0.92327421 0.902507690 0.8106531
10 cg02339369 0.915374608 0.87147961 0.902443744 0.7101839
Many thanks, ```
Thanks!
I have tried this; I am not sure after that how to find logeGFR associated probe? and also, how to filter significant probe?
Many thanks,
Your DMPs can be accessed with topTable function :
DMPs <- topTable(fit2, coef=2)
, then you can filter the DMPs by adjusted P-valThanks you so much Basti, could you please let me know how do we decide what coefficient number should we used? Why we are using coef=2 here?
Many thanks,
The columns of your design matrix
var
corresponds to the coefficients that are fitted by limma. If you check your design matrix, you will see that the first column is the intercept term and the second should refer to logeGFR, your variable of interestDear Basti, thank you so much!
Could you please also help me how to arrange samples in the same order in rows (targests2) and columns in (betaval2) as I am getting this error?
targets2 and betaval2 looks like this;