Question

Differential analysis with RSEM normalized counts using limma without voom

0

Entering edit mode

Biologist ▴ 110

@biologist-9801

Last seen 4.1 years ago

I have obtained TCGA gene expression RNAseq (polyA+ IlluminaHiSeq) Level 3 data. It has log2(x+1) transformed RSEM normalized counts.

For differential analysis I'm using "limma" package without voom. I came to know that these RSEM counts need to be normalized again with normalizeQuantiles.

So, I did that like following:

head(Rnadata)[1:4]

S1 S2 S3 S4
ARHGE 11.1818 11.2430 11.1612 12.0167
HIF3A 5.2482 4.0013 2.9374 4.7857
RNF17 4.1956 0.0000 0.0000 0.0000
RNF10 11.5047 12.0791 12.5931 11.4616
RNF11 9.5995 9.8248 9.9459 10.8368
RNF13 9.6257 10.5608 10.5179 10.1428

y <- normalizeQuantiles(Rnadata)

table(samples$type)

GP1n3 GP2
169 197

type2 <- factor(samples$type)

keep <- rowSums(y > log2(11)) >= 14
table(keep)

library("statmod")
y2 <- y[keep,]
design <- model.matrix(~type2)
fit <- lmFit(y2,design)
fit <- eBayes(fit)
tab <- topTable(fit,coef=2,adjust="BH")

This gave an output like following:

tab:

logFC AveExpr t P.Value adj.P.Val B
LOC152225 0.5177649 1.502743 3.782393 0.0001813158 0.9688791 -0.9982513
COL23A1 -0.6022330 3.743789 -3.679663 0.0002685533 0.9688791 -1.2085234
AGBL4 0.5032687 1.163802 3.532754 0.0004637616 0.9688791 -1.5001656
MPP6 0.3445659 7.334771 3.320850 0.0009875751 0.9688791 -1.9018043
PARVB -0.4983630 8.963399 -3.304377 0.0010456732 0.9688791 -1.9320752
GPX3 -0.5144697 13.830226 -3.292797 0.0010884016 0.9688791 -1.9532738
DKK3 -0.5604613 8.686580 -3.291027 0.0010950728 0.9688791 -1.9565075
C10orf116 -0.8018208 8.218658 -3.289999 0.0010989644 0.9688791 -1.9583847
ATOH8 -0.7274459 7.480381 -3.277855 0.0011459271 0.9688791 -1.9805237
MTF2 0.2071583 7.985538 3.265361 0.0011961812 0.9688791 -2.0032219

What I don't understand is why adj.P.Val is same for all the genes? Did I do any mistake? Can anyone help me in this? Thank you

limma differential gene expression tcga rsem • 2.6k views

ADD COMMENT • link updated 6.8 years ago by James W. MacDonald 65k • written 6.8 years ago by Biologist ▴ 110

score 0 · Answer 1 · 2017-07-13

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 day ago

United States

The adj.P.Val ISN'T the same for all genes. It's just the same for the 10 genes you show there. This simply means that you don't have any evidence for differential expression. Also, you can figure out why there are only 10 rows by reading the help page for topTable.

ADD COMMENT • link 6.8 years ago James W. MacDonald 65k

0

Entering edit mode

Yes I know that n=Inf gives all the genes. I wanted to give GP2-GP1n3 in contrast matrix. And How can I create contrast.matrix for the above data?

With the above data when I see the results it gave the following.

results <- decideTests(fit)
summary(results)
(Intercept) type2GP2
-1 0 1
0 0 16905
1 16906 0

This means GP2-GP1n3 has 1 downregulated gene and 16905 doesn't show any regulation, 0 upregulation. Am I right?

ADD REPLY • link 6.8 years ago Biologist ▴ 110

1

Entering edit mode

The limma User's Guide covers all this in exquisite detail. Rather than asking questions that you could easily answer yourself, you should instead avail yourself of the available documentation.

ADD REPLY • link 6.8 years ago James W. MacDonald 65k