Question: Problem with sva() probably stemming from model.matrix()

0

Guest User •

**12k**wrote:
Hi,
I am trying to do surrogate variable analysis. I strongly suspect that
I have not set the model matrices up correctly with model.matrix(),
but I can't see what I am doing wrong.
My code looks like this :
#!/usr/bin/Rscript
library(Biobase)
library(sva)
library(limma)
library(bladderbatch)
library(lumi)
# ICSfc - gender, ethnicity and smoking as covariates in SVA
#################################################
setwd('/Volumes/cuChangeDS2413/Data/Methylation/scratch')
exprsFile <- "testMQs.csv"
pDataFile <- "Alcohol27k_new_n309rev2.csv"
outFile <- "Alcohol27k_n309icsfc_genderethnicity_bcdata13.csv"
pdfFile <- "batchCorrect_icsfc_genderethnicity13.pdf"
options(warn=1)
options(showNcalls=400)
# Read the data.
exprs <- as.matrix (read.table(file=exprsFile, header=TRUE, sep=",",
na.strings="NA", dec=".", strip.white=TRUE, row.names=1, as.is=TRUE))
pData <- read.table(pDataFile, header=TRUE, sep=",", na.strings="NA",
dec=".", strip.white=TRUE, row.names=1, as.is=TRUE)
mod <- model.matrix(~as.integer(icsfc), data=pData )
print(quote=F,"MODEL mod :")
head(mod)
Nsv = num.sv(exprs, mod,method="leek")
# This is the null model. We are including the adjustment variables
(covariates), namely ethnicity, gender and smoking
print("MODEL mod0 :")
mod0 = model.matrix(~as.factor(ethnic) + ~as.factor(gender) +
~as.integer(tlfb_avgc), data=pData )
head(mod0)
svobj = sva(exprs, mod, mod0, method="irw", n.sv=Nsv)
------- And the output I get is :
[1] MODEL mod :
(Intercept) as.integer(icsfc)
M87101936 1 31
M87101983 1 30
M87102549 1 20
M87102857 1 33
M87103240 1 16
M87103605 1 33
[1] "MODEL mod0 :"
(Intercept) as.factor(ethnic)2 as.factor(ethnic)3
as.factor(ethnic)4
M87101936 1 0 0
1
M87101983 1 0 0
1
M87102549 1 0 0
0
M87102857 1 0 0
0
M87103240 1 0 0
0
M87103605 1 0 0
0
as.factor(ethnic)5 as.factor(ethnic)6 as.factor(gender)2
M87101936 0 0 0
M87101983 0 0 1
M87102549 0 0 0
M87102857 0 1 0
M87103240 0 0 0
M87103605 0 1 0
as.integer(tlfb_avgc)
M87101936 0
M87101983 0
M87102549 7
M87102857 0
M87103240 8
M87103605 3
Number of significant surrogate variables is: 36
Iteration (out of 5 ):Warning in pf(q, df1, df2, lower.tail, log.p) :
NaNs produced
Error in density.default(x, adjust = adj) : 'x' contains missing
values
Calls: sva ... irwsva.build -> edge.lfdr -> density -> density.default
Execution halted
-- As you can see, it crashes when it gets to the call to sva(). I'm
thinking I have not set up the models correctly, can anyone see what
is wrong? I apologize as I'm quite new to this part of R. If anyone
could suggest a good place to read about model.matrix(), that might
help, too.
Thanks,
Niles Oien, University of Colorado at Boulder
-- output of sessionInfo():
[1] MODEL mod :
(Intercept) as.integer(icsfc)
M87101936 1 31
M87101983 1 30
M87102549 1 20
M87102857 1 33
M87103240 1 16
M87103605 1 33
[1] "MODEL mod0 :"
(Intercept) as.factor(ethnic)2 as.factor(ethnic)3
as.factor(ethnic)4
M87101936 1 0 0
1
M87101983 1 0 0
1
M87102549 1 0 0
0
M87102857 1 0 0
0
M87103240 1 0 0
0
M87103605 1 0 0
0
as.factor(ethnic)5 as.factor(ethnic)6 as.factor(gender)2
M87101936 0 0 0
M87101983 0 0 1
M87102549 0 0 0
M87102857 0 1 0
M87103240 0 0 0
M87103605 0 1 0
as.integer(tlfb_avgc)
M87101936 0
M87101983 0
M87102549 7
M87102857 0
M87103240 8
M87103605 3
Number of significant surrogate variables is: 36
Iteration (out of 5 ):Warning in pf(q, df1, df2, lower.tail, log.p) :
NaNs produced
Error in density.default(x, adjust = adj) : 'x' contains missing
values
Calls: sva ... irwsva.build -> edge.lfdr -> density -> density.default
Execution halted
--
Sent via the guest posting facility at bioconductor.org.