I am using the rbsurv packaeg to select surival-assosiated probes. My data was pre-processed microarray data with ~7,000 probes and 192 tumor samples. Data was split in half for modeling and validation(train and test data). I also included in my model a risk predictor, tumor stage, to help model fitting. But I am confused as to how to determine a suitable value for the max.n.genes and n.iter argument. I understand that n.seq is for multi-model fitting and n.fold for sampling and validation. Here is the code I tried and error message:
# code 1
fit4 <- rbsurv(time=t.train, status=s.train, x=x.train, method="efron",
z=z.train, alpha=0.05, gene.ID=rownames(x.train),
max.n.genes=30, n.iter=100, n.fold=3, n.seq=6, seed = 1234)
Please wait...[1] "Too few genes or samples"
Error in rep(i, nrow(out$model)) : invalid 'times' argument
# code 2
fit4 <- rbsurv(time=t.train, status=s.train, x=x.train, method="efron",
z=z.train, alpha=0.05, gene.ID=rownames(x.train),
max.n.genes=20, n.iter=50, n.fold=3, n.seq=6, seed = 1234)
Please wait...Error in if ((ncol(x) < 5) | (nrow(x) < 10)) { :
argument is of length zero
# code 3
fit4 <- rbsurv(time=t.train, status=s.train, x=x.train, method="efron",
z=z.train, alpha=0.05, gene.ID=rownames(x.train),
max.n.genes=60, n.iter=50, n.fold=3, n.seq=6, seed = 1234)
Please wait... Done.
# this one ran without any errors
To be brief:
In code 1, max.n.genes=30, n.iter=100, error.
In code 2, max.n.genes=20, n.iter=50, error.
In code 3, max.n.genes=60, n.iter=50, no error.
Though code 3 ran without any error, max.n.gene is 60, but I want to get a gene signiture model with 5~15 genes or so.
And I don't really understand what n.iter does and how it affect the modeling process.
Why deos the error happen whenever an samller max.n.genes is used? How should I determine a optimal value of max.n.gene and n.iter?
