Question: Use cpm() in edgeR and voom() in limma to find genes that have low expression in individual tumors
1
15 months ago by
Frocha10
Frocha10 wrote:

I have RNA-Seq count data for two groups of samples (normal vs tumor). For a specific gene, I wish to find the tumor samples that have low gene expression (compared to the samples in the normal group). The "low" expression means the (possibly transformed) log(cpm) has a z-score < -2, where the z-score is calculated using mean and standard deviation of the normal samples.

In order to do this, should I first filter the genes with low counts (as I usually do for differentially expressed gene analysis), using the results of cpm() function in edgeR package? If so, how? After the gene filtering and TMM normalization, should I use voom() in limma package to transform log(cpm) to values that are normally distributed?

limma edger gene filtering • 1.2k views
modified 15 months ago by Gordon Smyth37k • written 15 months ago by Frocha10

I'm not quite clear which analysis you are wanting to do with voom. Have you already computed the z-scores yourself, or are you asking how to use voom to compute the z-scores?

Thanks, Gordon.

I wish to identify the tumor samples with low gene expression (compared to normal samples), and then try to see whether the corresponding patients have survival rates that are different from the survival rates of other patients.

My question then is: how to best compute the z-scores?

Are you after one z-score for each gene, or a z-score for each individual tumor for each gene?

Sorry for confusing. I wish to get the z-score for each individual tumor for each gene.

Answer: filter genes using cpm() in edgeR and make data normal using voom() in limma
1
15 months ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

You should always filter out genes that have consistently very low counts, and the guidelines for voom are the same as for edgeR. You could for example use:

keep <- filterByExpr(dge, design)
dge <- dge[keep,,keep.lib.size=FALSE]
dge <- calcNormFactors(dge)


There is no change to this even if you wish to find genes that have very low expression in one group, although you could try reducing the filter thresholds a little if you want to live dangerously.

You want to find genes that are down-regulated in individual tumors relative to normal. Let's assume that the factor "Tumor" takes values "Normal", "Tumor1", "Tumor2" etc. All the normal samples should have the same name but each distinct tumor should have a different name.

Tumor <- relevel(Tumor, ref="Normal")
design <- model.matrix(~Tumor)


Now we can just do a regular voom analysis:

v <- voom(dge,design)
fit <- lmFit(v,design)
fit <- eBayes(fit, robust=TRUE)

The usual limma tests will tell you which genes are down in which tumors. For example:

topTable(fit, coef="Tumor:Tumor1")

will show you which genes are down-regulated in Tumor1 relative to normal.

You don't need to compute z-scores explicitly. A z-score < -2 corresponds to a p-value of 0.0455:

> 2*pnorm(-2)
[1] 0.04550026

To find genes with zscore < -2 in individual tumors, you can simply use:

Low <- (fit$p.value[,-1] < 2*pnorm(-2)) & (fit$coef[,-1] < 0)

This will give you a matrix of genes by tumors, which an entry of TRUE if z-score < -2 and FALSE if z-score > -2.

Note the use of "[,-1]" in the previous code line, which simply gets rid of the intercept term.

Thanks!

One question: should we put the design<-model.matrix(~Tumor) line at the beginning?

In fact, the goal is not to do differentially expressed gene analysis. The goal is to calculate z-scores for the tumor sample so that we can divide the tumor samples into two groups: those with z-score <-2 and those  with z-score > -2.

OK, I understand what you want now and have revised my answer accordingly.

Thank you very much!

I am sorry that I have a new question: there is no p.value information in fit after I run the last line. Is it because that each tumor sample is treated as a single group? If so, how to identify the tumor samples with z-score < -2?

eBayes always produces p-values, so what you say cannot be correct.

You may have made a mistake earlier in the code, but you provide no information so I am not in a position to debug it for you. The code that I provided does work and it does produce p-values.

The fact that every tumor is a single group is not a problem. You do however need more than one Normal sample.

You are right. The problem is that the "statmod" package was not installed on my computer, thus the line

fit <- eBayes(fit, robust=TRUE)

was not executed (I did not notice that). After I installed the "statmod" package, the problem is solved.

Thank you very much!