minfi, limma, adjusting for cell counts - integrating the resoults of estimateCellCounts into the linear regression model
2
0
Entering edit mode
@565e9996
Last seen 13 months ago
Spain

Hi guys,

Once again I have failed in my battle for DNA methylation analysis from Illumina 450k array, using minfi package. I am a biologist with virtually no experience with R, I am learning by doing. I am trying

I used

estimateCellCounts()

to estimate the cell counts. The cellCounts$counts table gave me the proportion of different blood cells in each sample. I have absolutely no idea how to integrate them now into the linear regression model and I cannot find any tutorial online. What I figured out is that since counts table have Samples as rowname, I can add the counts for each sample into initial targets

targets2 <- merge(x = targets, y = cellCounts$counts, by.x = "Sample_File", by.y = "row.names")

from this I tried to make a model matrix by:

groups <- pData(RGset)$state
Statefactor <- factor(groups)
CD8T <- targets2$CD8T
CD4T <- argets2$CD4T
NK <- targets2$NK
Bcell <- targets2$Bcell
Mono <- targets2$Mono
Gran <- targets2$Gran

design <- model.matrix(~Statefactor + CD8T + CD4T + NK + Bcell + Mono + Gran, data=targets2)

it didn't look bad, so I continued with the regression analysis

#mVals are M values, transformed Beta-values, they are ok

fit <- lmFit(mVals, design)

#I moderated the pValues with eBayes
fiteBayes <- eBayes(fit)

#and get the results, corrected by BH for multiple testing, using 0.05 pval as cutoff
LimmaAdjustedbyCellCount <- topTable(fiteBayes, coef=2, adjust="BH", p.value = 0.05)

Unfortunatelly, I only get 10 significant CpGs, even if I remove the p.val treshold.

            logFC    AveExpr         t      P.Value    adj.P.Val        B
cg07786675 -3.533006  2.3678923 -39.69637 1.817235e-56 7.234031e-51 117.8857
cg19526353 -2.952074  1.8577121 -37.96320 6.508638e-55 1.295476e-49 114.4291
cg00079785 -3.746606  2.3743128 -37.02375 4.813950e-54 6.387774e-49 112.4908
cg07779699 -2.333026  1.8917446 -36.78263 8.103917e-54 8.064998e-49 111.9857
cg02064284 -2.578723 -1.7295516 -35.21867 2.560792e-52 2.038795e-47 108.6305
cg14478475 -3.172427  1.7281555 -34.98529 4.336602e-52 2.877184e-47 108.1178
cg13290149 -2.173626  0.9011973 -34.74827 7.427725e-52 4.224030e-47 107.5937
cg13374701 -5.490708  3.9527985 -34.23497 2.408642e-51 1.198537e-46 106.4473
cg08511651 -2.958672 -1.6143264 -34.10913 3.221398e-51 1.424856e-46 106.1638
cg15078284 -3.225653 -1.8802243 -33.80050 6.598595e-51 2.417515e-46 105.4644

sessionInfo()

Clearly, there is something wrong, as in the end after filtering for 10% difference I should get 7k probes. I did get those 7k probes when using the bumphunter approach (90% overlap with the original paper) so everything else, outside of the Limma analysis is ok.

I get the same problem when not using intercept

design <- model.matrix(~0 + Statefactor + CD8T + CD4T + NK + Bcell + Mono + Gran, data=targets2)

There is very high possibility, the entire approach for the model matrix is totally wrong. Could someone please help me?

Great, I cannot even post this question. Please kill me

Enter the body of text here

Code should be placed in three backticks as shown below


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )
minfi limma • 1.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

You for sure have way more than 10 probes. The default for topTable is n = 10, so you want to do n = Inf if you want all the probes with an FDR<0.05.

ADD COMMENT
0
Entering edit mode

Gordon Smyth and James,

thank you very much for your reply! I double checked the description of the topTable function and I couldn't find an information that the default for n is 10. Being used to narrative description, I am generally having problems reading the documentation of the packages - I guess I need to trai myself into that more. I have noticed though that in the both examples of the usage, number is set to 10. Does it mean that it is customary to list the arguments in the help page with their default option?

I feel so silly that I waste so much time for something as simple as function default option... Many thanks again for solving the mistery for me! :)

ADD REPLY
0
Entering edit mode

It's right at the top in the 'Usage' section, which lists all the defaults.

Usage:

     topTable(fit, coef = NULL, number = 10, genelist = fit$genes,
              adjust.method = "BH", sort.by = "B", resort.by = NULL,
              p.value = 1, fc = NULL, lfc = NULL, confint = FALSE)
ADD REPLY
0
Entering edit mode

wow! I din't know about it. This is a game changing info. Thank you!

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia

I am learning by doing

The first lesson to learn in R is how to read the help page for each function. The help pages follow a consistent format and give brief but complete information on all the options that the function offers. In this case, you could have typed help(topTable) and then you would see that the default behaviour of topTable is to show just the top 10 DE genes (which it does to avoid sending tens of thousands of rows of results to the screen, which wouldn't be very helpful!).

Even quicker is to type args(topTable), which tells you all the arguments of the function and their default settings.

ADD COMMENT

Login before adding your answer.

Traffic: 680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6