Hi guys,
Once again I have failed in my battle for DNA methylation analysis from Illumina 450k array, using minfi package. I am a biologist with virtually no experience with R, I am learning by doing. I am trying
I used
estimateCellCounts()
to estimate the cell counts. The cellCounts$counts table gave me the proportion of different blood cells in each sample. I have absolutely no idea how to integrate them now into the linear regression model and I cannot find any tutorial online. What I figured out is that since counts table have Samples as rowname, I can add the counts for each sample into initial targets
targets2 <- merge(x = targets, y = cellCounts$counts, by.x = "Sample_File", by.y = "row.names")
from this I tried to make a model matrix by:
groups <- pData(RGset)$state
Statefactor <- factor(groups)
CD8T <- targets2$CD8T
CD4T <- argets2$CD4T
NK <- targets2$NK
Bcell <- targets2$Bcell
Mono <- targets2$Mono
Gran <- targets2$Gran
design <- model.matrix(~Statefactor + CD8T + CD4T + NK + Bcell + Mono + Gran, data=targets2)
it didn't look bad, so I continued with the regression analysis
#mVals are M values, transformed Beta-values, they are ok
fit <- lmFit(mVals, design)
#I moderated the pValues with eBayes
fiteBayes <- eBayes(fit)
#and get the results, corrected by BH for multiple testing, using 0.05 pval as cutoff
LimmaAdjustedbyCellCount <- topTable(fiteBayes, coef=2, adjust="BH", p.value = 0.05)
Unfortunatelly, I only get 10 significant CpGs, even if I remove the p.val treshold.
logFC AveExpr t P.Value adj.P.Val B
cg07786675 -3.533006 2.3678923 -39.69637 1.817235e-56 7.234031e-51 117.8857
cg19526353 -2.952074 1.8577121 -37.96320 6.508638e-55 1.295476e-49 114.4291
cg00079785 -3.746606 2.3743128 -37.02375 4.813950e-54 6.387774e-49 112.4908
cg07779699 -2.333026 1.8917446 -36.78263 8.103917e-54 8.064998e-49 111.9857
cg02064284 -2.578723 -1.7295516 -35.21867 2.560792e-52 2.038795e-47 108.6305
cg14478475 -3.172427 1.7281555 -34.98529 4.336602e-52 2.877184e-47 108.1178
cg13290149 -2.173626 0.9011973 -34.74827 7.427725e-52 4.224030e-47 107.5937
cg13374701 -5.490708 3.9527985 -34.23497 2.408642e-51 1.198537e-46 106.4473
cg08511651 -2.958672 -1.6143264 -34.10913 3.221398e-51 1.424856e-46 106.1638
cg15078284 -3.225653 -1.8802243 -33.80050 6.598595e-51 2.417515e-46 105.4644
sessionInfo()
Clearly, there is something wrong, as in the end after filtering for 10% difference I should get 7k probes. I did get those 7k probes when using the bumphunter approach (90% overlap with the original paper) so everything else, outside of the Limma analysis is ok.
I get the same problem when not using intercept
design <- model.matrix(~0 + Statefactor + CD8T + CD4T + NK + Bcell + Mono + Gran, data=targets2)
There is very high possibility, the entire approach for the model matrix is totally wrong. Could someone please help me?
Great, I cannot even post this question. Please kill me
Enter the body of text here
Code should be placed in three backticks as shown below
# include your problematic code here with any corresponding output
# please also include the results of running the following in an R session
sessionInfo( )
Gordon Smyth and James,
thank you very much for your reply! I double checked the description of the topTable function and I couldn't find an information that the default for n is 10. Being used to narrative description, I am generally having problems reading the documentation of the packages - I guess I need to trai myself into that more. I have noticed though that in the both examples of the usage, number is set to 10. Does it mean that it is customary to list the arguments in the help page with their default option?
I feel so silly that I waste so much time for something as simple as function default option... Many thanks again for solving the mistery for me! :)
It's right at the top in the 'Usage' section, which lists all the defaults.
wow! I din't know about it. This is a game changing info. Thank you!