Question

How to check linear model

0

Entering edit mode

Jitendra ▴ 10

@nabiyogesh-11718

Last seen 18 months ago

United Kingdom

Hi all,

I am using below model for BMI and CpG array methylation association using limma? Could you please suggest how I can confirm whether this model is good or bad and is there any way to check for batch effect in this model?

#model matrix

var<-model.matrix(~BMI + as.factor(Gender) + Age +CD8T +CD4T +NK + Bcell +Mono ,data=targets2)

fit<-lmFit(mval,var)


fit2<-eBayes(fit,trend=TRUE, robust=TRUE)

probe<-topTable(fit2,adjust="BH",coef=2,num=Inf)

#sig.probe<-probe[which(probe$adj.P.Val<=0.05),]

write.table(sig.probe,file="Result.BMI.associated.probe.txt",sep="\t",quote=TRUE)

modalmatrix limma R linear • 3.1k views

ADD COMMENT • link 2.9 years ago Jitendra ▴ 10

score 0 · Answer 1 · 2023-02-22

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 17 hours ago

United States

If you are concerned about a batch effect, you use MDS plots to check for that. Also, you should consider using RUV to control for cellular heterogeneity (I assume these are blood samples), rather than the estimated fractions of such cells.

ADD COMMENT • link 2.9 years ago James W. MacDonald 68k

0

Entering edit mode

Dear James, Thank you so much, I got sample sample annotation and methylation Mvlaue like this not sure how to plot MDS? by using this command;

plotMDS(mvals.filt, col = batch, labels = outcome, cex = 0.7,cex.axis = 1, cex.lab = 1.6, main = "MDS batch")

> #phenotype file
> targets<-read.table("Sample.Info.csv", header=TRUE, stringsAsFactor> head(targets)s=NULL, sep=",")

> # first column as row names
> targets2<-targets[,-1]
> rownames(targets2)<-targets[,1]
> 
> head(targets2)
      Sentrix_ID Sentrix_Position    Batch     Category Gender     BMI
DC541   2.03e+11           R06C01 Batch 15      Control   Male 141.3943
DC485   2.03e+11           R04C01  Batch 8      Control   Male 133.6376
DC490   2.03e+11           R08C01  Batch 8       Control   Male 133.1413
DC131   2.03e+11           R08C01  Batch 7       Control Female 131.9288
DC574   2.03e+11           R03C01 Batch 16       Control Female 130.6548
DC411   2.03e+11           R02C01 Batch 18      Control   Male 127.5505
      Age   RRT       CD8T       CD4T         NK       Bcell       Mono
DC541  29 FALSE 0.09559439 0.22490564 0.00000000 0.087392632 0.03744009
DC485  42 FALSE 0.04212238 0.12661890 0.04028556 0.024276752 0.09722832
DC490  39 FALSE 0.03568210 0.15196063 0.03766905 0.031379030 0.07066799
DC131  58 FALSE 0.10451144 0.04749711 0.03571969 0.004003534 0.06539260
DC574  24 FALSE 0.07358490 0.17676019 0.07706284 0.023125340 0.08663073
DC411  31 FALSE 0.01487569 0.11073860 0.01740834 0.065593039 0.05207325
           Gran
DC541 0.4991278
DC485 0.6401493
DC490 0.6507270
DC131 0.7140364
DC574 0.4966720
DC411 0.7113963
> #read mehtylation data
> 
> betaval<-read.table("regression.new.count.txt", header=TRUE, stringsAsFactors=FALSE)


> # first column as row names
> 
> betaval2<-betaval[,-1]
> rownames(betaval2)<-betaval[,1]
> 
> #convert beta value into m value
> mval<-log2(betaval2/(1-betaval2))
> 
> 
> dim(mval)
[1] 768078    136
> 
> head(mval, 10) [,1:5]
                DC541      DC485      DC490      DC131      DC574
cg26928153  2.7300741  2.7896585  3.3723166  0.9507823  2.9886625
cg16269199  0.7418502  1.8620828  2.0494776  0.3126503  1.2317167
cg13869341  2.6550249  2.9799319  1.9045532  1.3755509  2.3476551
cg24669183  1.6002070  2.7173004  3.0027492  1.0360671  2.4162655
cg26679879 -0.7091479 -0.9267193 -0.6834437 -0.3132538 -1.4234119

ADD REPLY • link 2.9 years ago Jitendra ▴ 10

0

Entering edit mode

I was assuming that you were using the minfi package, in which case it would be mdsPlot. But I imagine any method that is meant to do an MDS plot should work.

ADD REPLY • link 2.9 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks James, I am using limma. not sure how to merge sample info with count data.

ADD REPLY • link 2.9 years ago Jitendra ▴ 10

0

Entering edit mode

You don't merge them. You should have as many rows for your sample data as you have columns for your methylation data (It's not counts!), and ideally they are in the same order. Then you can color the MDS plotting symbols using a column of your sample data.

ADD REPLY • link 2.9 years ago James W. MacDonald 68k

0

Entering edit mode

Ok I got it now, I need to first order samples in same order in both files; sample info and mehtylation data and then can use mds plot command.

ADD REPLY • link 2.9 years ago Jitendra ▴ 10

0

Entering edit mode

Dear James, I am trying to plot mval on MDS plot but it shows R memory issue;

> d <- dist(mval) # euclidean distances between the rows
Error: cannot allocate vector of size 2197.7 Gb

ADD REPLY • link 2.9 years ago Jitendra ▴ 10

0

Entering edit mode

You don't need to use all of the CpGs to do an MDS. For example, mdsPlot in minfi, which you could use rather than doing it by hand, by default only uses 1000 CpG sites.

> args(mdsPlot)
function (dat, numPositions = 1000, sampNames = NULL, sampGroups = NULL, 
    xlim, ylim, pch = 1, pal = brewer.pal(8, "Dark2"), legendPos = "bottomleft", 
    legendNCol, main = NULL) 
NULL

ADD REPLY • link 2.9 years ago James W. MacDonald 68k

0

Entering edit mode

Dear James, I am not using minfi. Is there any other way to find most 1000 variable CpG sites and use those for mds plot? Thanks for all help and time.

ADD REPLY • link 2.9 years ago Jitendra ▴ 10

0

Entering edit mode

You could look at the code in minfi that does it.

ADD REPLY • link 2.9 years ago James W. MacDonald 68k

0

Entering edit mode

Dear James thanks, minfi used rowVars to select top 1000 rows, not sure how should I modify it for my data.

 o <- order(rowVars(b), decreasing = TRUE)[seq_len(numPositions)]
    d <- dist(t(b[o, ]))
    fit <- cmdscale(d)

ADD REPLY • link 2.9 years ago Jitendra ▴ 10

0

Entering edit mode

Will this below work?

o <- order(rowVars(b), decreasing = TRUE)[seq_len(1000)]

But then why they used this below code; in my MDS tutorial they don't use this code (https://www.statmethods.net/advstats/mds.html)

d <- dist(t(b[o, ]))

ADD REPLY • link 2.9 years ago Jitendra ▴ 10

0

Entering edit mode

You have a dataset that likely has hundreds of thousands of CpG measurements, and you don't have the memory to run cmdscale on that much data. Which is why you are filtering to a subset. You shouldn't expect a random example on the internet to need to filter, because conventionally one doesn't have that much data per subject.

ADD REPLY • link 2.9 years ago James W. MacDonald 68k

0

Entering edit mode

thanks James for your kind time and help; I am not sure now how to give col command to color point data based on Batch sample info column.

> all(rownames(targets2) %in% colnames(mval))
[1] TRUE
> mval <- mval[, rownames(targets2)]
> class(mval)
[1] "data.frame"
> head(mval, 10) [,1:5]
                DC541      DC485      DC490      DC131      DC574
cg26928153  2.7300741  2.7896585  3.3723166  0.9507823  2.9886625
cg16269199  0.7418502  1.8620828  2.0494776  0.3126503  1.2317167
cg13869341  2.6550249  2.9799319  1.9045532  1.3755509  2.3476551
cg24669183  1.6002070  2.7173004  3.0027492  1.0360671  2.4162655
cg26679879 -0.7091479 -0.9267193 -0.6834437 -0.3132538 -1.4234119
cg22519184 -0.6181722 -0.6994303 -0.6997048 -0.5134222 -1.1743039
cg15560884  0.6642570  1.1534960  1.3187553  0.5705751  0.8379092
cg01014490 -6.7694800 -5.9549894 -7.2408883 -2.5563076 -5.2000328
cg10692041  3.3138488  3.5889757  3.2105789  2.0980532  3.0335786
cg02339369  3.4351998  2.7614697  3.2095307  1.2930551  3.8525829
> mval1 <-as.matrix(mval)
> class(mval1)
[1] "matrix" "array" 
> o <- order(rowVars(mval1), decreasing = TRUE)[seq_len(1000)]
> d <- dist(t(mval1[o, ]))
> fit <- cmdscale(d,eig=TRUE, k=2) # k is the number of dim
> fit # view results
> # plot solution 
> x <- fit$points[,1]
> y <- fit$points[,2]
> plot(x, y, xlab="Coordinate 1", ylab="Coordinate 2", main="Nonmetric MDS", type= "n")
> abline(v = 0, h = 0, lty = 3)
> text(x, y, labels = targets2$Batch, cex=.7)
> dim.off()

ADD REPLY • link 2.9 years ago Jitendra ▴ 10

0

Entering edit mode

Thank you so much James for all help and time. I will try to use ggplot to improve figure quality. Could you also suggest a good resources to read about linear regression like in this code how I can be sure that coef 2 belongs to BMI

  #model matrix

    var<-model.matrix(~BMI + as.factor(Gender) + Age +CD8T +CD4T +NK + Bcell +Mono ,data=targets2)

    fit<-lmFit(mval,var)


    fit2<-eBayes(fit,trend=TRUE, robust=TRUE)

    probe<-topTable(fit2,adjust="BH",coef=2,num=Inf)

ADD REPLY • link 2.9 years ago Jitendra ▴ 10

0

Entering edit mode

You are now asking questions that you could just as easily answer yourself using Google searches, or by reading the limma User's Guide. If you expect to get anywhere using R, you will need to learn how to answer simple questions by using Google and reading the relevant vignettes.

ADD REPLY • link 2.9 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you so much James, I understand it now from here; https://stackoverflow.com/questions/48564316/meaning-of-coef-in-limma, but could you please now help how to check linear model by plotting residuals to see if this model is good or bad?

ADD REPLY • link 2.9 years ago Jitendra ▴ 10