Question

decideTests error with edgeR package

0

Entering edit mode

Biologist ▴ 110

@biologist-9801

Last seen 4.1 years ago

Hello Everyone,

I have counts data with 366 samples as columns and more than 50k ensembl ids as rownames. The data is stored in a dataframe "U".
Data looks like following:

S1 S2 S3
ENSG00000000003.13 5745 8742 7257
ENSG00000000005.5 4 1 0
ENSG00000000419.11 1394 2211 615
ENSG00000000457.12 344 1166 692
ENSG00000000460.15 405 480 258
ENSG00000000938.11 144 132 110

table(df$Status)

F2 F3 F4 NE
125 220 20 1

library(edgeR)

group <- factor(paste0(df$Status))
y <- DGEList(U, group=group)

keep <- rowSums(cpm(y) > 0.5) >= 2

table(keep)
summary(keep)

Mode FALSE TRUE NA's

logical 35503 24985 0

y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y, method = "TMM")

design2 <- model.matrix(~ 0 + group)
colnames(design2) <- c("G2","G3","G4","NE")
design2

y <- estimateDisp(y, design2, robust=TRUE)

y$common.dispersion

[1] 1.030704

fit <- glmQLFit(y, design2, robust=TRUE)
head(fit$coefficients)

G2 G3 G4 NE
ENSG00000000003.13 -9.614857 -9.514155 -9.219875 -9.388728
ENSG00000000005.5 -17.008674 -16.908788 -16.166986 -16.630297
ENSG00000000419.11 -10.940426 -10.914407 -10.898501 -10.804819
ENSG00000000457.12 -11.582309 -11.493746 -11.423292 -12.203859
ENSG00000000460.15 -12.362215 -12.205290 -11.947569 -12.040664
ENSG00000000938.11 -12.514476 -12.640490 -13.160899 -13.074225

contrast.matrix <- makeContrasts(G2-G3, levels=design2)
contrast.matrix

Contrasts
Levels G2 - G3
EG2 1
EG3 -1
EG4 0
NE 0

qlf <- glmQLFTest(fit, contrast=contrast.matrix)
topTags(qlf)

Coefficient: 1*G2 -1*G3
logFC logCPM F PValue FDR
ENSG00000124232.9 7.351170 0.86857907 278.9784 6.083121e-47 1.519868e-42
ENSG00000102195.8 7.077038 -0.11326609 236.7369 1.532441e-41 1.914402e-37
ENSG00000165066.12 6.034396 -1.42387186 224.2053 7.269869e-40 6.054590e-36
ENSG00000138161.11 3.629804 0.03880203 210.2262 5.937722e-38 3.242534e-34
ENSG00000164434.10 5.533205 -0.92141498 209.9531 6.488962e-38 3.242534e-34
ENSG00000256463.7 5.278677 -2.52333685 201.0082 1.152294e-36 4.798346e-33
ENSG00000233887.1 4.054302 -4.49118571 200.4813 3.120264e-36 1.113711e-32
ENSG00000196431.3 4.767828 -1.79054718 192.6636 1.759004e-35 5.173259e-32
ENSG00000280109.1 5.872673 0.25250281 192.4883 1.863491e-35 5.173259e-32
ENSG00000180053.7 3.349894 -4.85962856 193.4720 2.948655e-35 7.367214e-32

summary(decideTests(qlf))

Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
dims [product 99940] do not match the length of object [20]

Can anyone tell me what is this error. Did I go wrong anywhere?

Thank you

edger r differential gene expression • 1.6k views

ADD COMMENT • link updated 6.7 years ago by Aaron Lun ★ 28k • written 6.7 years ago by Biologist ▴ 110

0

Entering edit mode

Code looks fine to me. Check if decideTestsDGE works, in which case it may be an S3 dispatch error. If that also fails, make a minimum working example that other people can run to reproduce the error.

ADD REPLY • link 6.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi Aaron,

Thanks for the reply. I used like following:

summary(decideTestsDGE(qlf))
[,1]
-1 949
0 22517
1 1519

It means 949 downregulated genes and 1519 upregulated genes b/w G2-G3 comparison. Am I right? And in topTags(qlf) Which are differentially expressed genes? On what basis should I consider them as differentially expressed?

And how to save that "qlf" into file?

ADD REPLY • link 6.7 years ago Biologist ▴ 110

score 0 · Answer 1 · 2017-08-22

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 8 hours ago

The city by the bay

In response to your questions in the comment:

Yes.
See ?topTags, with the n and p.value arguments.
Apply a threshold to the FDR. For example, 0.05.
Look at write.table (for text) or saveRDS (for a serialized R object).

ADD COMMENT • link 6.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

I did it in this way.

tr <- glmTreat(fit, contrast=contrast.matrix, lfc=log2(1.2))
topTags(tr)

Coefficient: 1*EG2 -1*EG3
logFC unshrunk.logFC logCPM PValue FDR
ENSG00000124232.9 7.351170 7.426909 0.86857907 4.918596e-46 1.228911e-41
ENSG00000102195.8 7.077038 7.206943 -0.11326609 1.112308e-40 1.389551e-36
ENSG00000165066.12 6.034396 6.195477 -1.42387186 9.868600e-39 8.218899e-35
ENSG00000164434.10 5.533205 5.609230 -0.92141498 1.102372e-36 6.885693e-33
ENSG00000138161.11 3.629804 3.640628 0.03880203 2.224748e-35 1.036203e-31
ENSG00000256463.7 5.278677 5.513779 -2.52333685 2.488379e-35 1.036203e-31
ENSG00000280109.1 5.872673 5.914140 0.25250281 1.854051e-34 5.885115e-31
ENSG00000233887.1 4.054302 5.198814 -4.49118571 1.884368e-34 5.885115e-31
ENSG00000196431.3 4.767828 4.859437 -1.79054718 5.811789e-34 1.613417e-30
ENSG00000180053.7 3.349894 5.164781 -4.85962856 2.589060e-33 6.468768e-30

summary(decideTestsDGE(tr))

[,1]
-1 537
0 23545
1 903

So from this 1440 (537 downregulated and 903 upregulated) genes are detected as Differentially expressed genes with fold-change significantly above 1.2 at an FDR cut-off of 5%. Is it right?

I have another question. Why "decideTests" function didn't work?

Thank you

ADD REPLY • link 6.7 years ago Biologist ▴ 110

1

Entering edit mode

Yes.
You're probably not using the latest version of edgeR (3.18.*), in which S3 methods were added to allow decideTests to operate on various edgeR objects.

ADD REPLY • link 6.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you very much Aaron. I guess all my doubts are cleared now!

Just one last check.

tr <- glmTreat(fit, contrast=contrast.matrix, lfc=log2(1.2))
tab2 <- topTags(tr,n=Inf)

keep <- tab2$table$FDR <= 0.05

write.csv(tab2$table[keep,], file="DE.csv")

In the .csv file in total I have 1440 genes. From this all the positive logFC are upregulated genes and negative logFC are downregulated genes.

Am I right?

Thankyou

ADD REPLY • link 6.7 years ago Biologist ▴ 110

1

Entering edit mode

Yes, that is correct.

ADD REPLY • link 6.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

Dear Aaron,

I have a doubt. With the same data I used limma, Glimma and edgeR for Differential expression (http://combine-australia.github.io/RNAseq-R/06-rnaseq-day1.html) But with the same comparison I found only one differentially expressed gene. In this process data will be transformed into Voom. Is it because of that? From few days back I'm trying in this way but couldn't find more than 1 differentially expressed gene.

But now I used edgeR which is mentioned in tutorial from feature counts. I used above code and got more than 1000 DE genes.

Can you please tell why there lot of difference?

ADD REPLY • link 6.7 years ago Biologist ▴ 110

0

Entering edit mode

Ask a new question with appropriate code.

ADD REPLY • link 6.7 years ago Aaron Lun ★ 28k