You should always filter out genes that have consistently very low counts, and the guidelines for voom are the same as for edgeR. You could for example use:
keep <- filterByExpr(dge, design)
dge <- dge[keep,,keep.lib.size=FALSE]
dge <- calcNormFactors(dge)
There is no change to this even if you wish to find genes that have very low expression in one group, although you could try reducing the filter thresholds a little if you want to live dangerously.
You want to find genes that are down-regulated in individual tumors relative to normal. Let's assume that the factor "Tumor" takes values "Normal", "Tumor1", "Tumor2" etc. All the normal samples should have the same name but each distinct tumor should have a different name.
Tumor <- relevel(Tumor, ref="Normal")
design <- model.matrix(~Tumor)
Now we can just do a regular voom analysis:
v <- voom(dge,design)
fit <- lmFit(v,design)
fit <- eBayes(fit, robust=TRUE)
The usual limma tests will tell you which genes are down in which tumors. For example:
topTable(fit, coef="Tumor:Tumor1")
will show you which genes are down-regulated in Tumor1 relative to normal.
You don't need to compute z-scores explicitly. A z-score < -2 corresponds to a p-value of 0.0455:
> 2*pnorm(-2)
[1] 0.04550026
To find genes with zscore < -2 in individual tumors, you can simply use:
Low <- (fit$p.value[,-1] < 2*pnorm(-2)) & (fit$coef[,-1] < 0)
This will give you a matrix of genes by tumors, which an entry of TRUE if z-score < -2 and FALSE if z-score > -2.
Note the use of "[,-1]" in the previous code line, which simply gets rid of the intercept term.
I'm not quite clear which analysis you are wanting to do with voom. Have you already computed the z-scores yourself, or are you asking how to use voom to compute the z-scores?
Thanks, Gordon.
I wish to identify the tumor samples with low gene expression (compared to normal samples), and then try to see whether the corresponding patients have survival rates that are different from the survival rates of other patients.
My question then is: how to best compute the z-scores?
Are you after one z-score for each gene, or a z-score for each individual tumor for each gene?
Sorry for confusing. I wish to get the z-score for each individual tumor for each gene.