decideTests error with edgeR package
1
0
Entering edit mode
Biologist ▴ 120
@biologist-9801
Last seen 4.8 years ago

Hello Everyone,

I have counts data with 366 samples as columns and more than 50k ensembl ids as rownames. The data is stored in a dataframe "U".
Data looks like following:

                                                   S1                S2                S3
ENSG00000000003.13            5745            8742            7257
ENSG00000000005.5                4                   1                  0
ENSG00000000419.11            1394            2211             615
ENSG00000000457.12             344            1166              692
ENSG00000000460.15             405             480              258
ENSG00000000938.11             144             132               110

table(df$Status)

  F2   F3   F4  NE 
125  220  20    1 

library(edgeR)

group <- factor(paste0(df$Status))
y <- DGEList(U, group=group)

keep <- rowSums(cpm(y) > 0.5) >= 2

table(keep)
summary(keep)

Mode   FALSE    TRUE    NA's 

logical   35503   24985       0 


y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y, method = "TMM")

design2 <- model.matrix(~ 0 + group)
colnames(design2) <- c("G2","G3","G4","NE")
design2

y <- estimateDisp(y, design2, robust=TRUE)

y$common.dispersion

[1] 1.030704

fit <- glmQLFit(y, design2, robust=TRUE)
head(fit$coefficients)

                                                 G2         G3             G4             NE
ENSG00000000003.13  -9.614857  -9.514155  -9.219875  -9.388728
ENSG00000000005.5  -17.008674 -16.908788 -16.166986 -16.630297
ENSG00000000419.11 -10.940426 -10.914407 -10.898501 -10.804819
ENSG00000000457.12 -11.582309 -11.493746 -11.423292 -12.203859
ENSG00000000460.15 -12.362215 -12.205290 -11.947569 -12.040664
ENSG00000000938.11 -12.514476 -12.640490 -13.160899 -13.074225

contrast.matrix <- makeContrasts(G2-G3, levels=design2)
contrast.matrix

Contrasts
Levels  G2 - G3
   EG2         1
   EG3        -1
   EG4         0
   NE           0

qlf <- glmQLFTest(fit, contrast=contrast.matrix)
topTags(qlf)

Coefficient:  1*G2 -1*G3 
                                         logFC       logCPM          F              PValue           FDR
ENSG00000124232.9  7.351170  0.86857907 278.9784 6.083121e-47 1.519868e-42
ENSG00000102195.8  7.077038 -0.11326609 236.7369 1.532441e-41 1.914402e-37
ENSG00000165066.12 6.034396 -1.42387186 224.2053 7.269869e-40 6.054590e-36
ENSG00000138161.11 3.629804  0.03880203 210.2262 5.937722e-38 3.242534e-34
ENSG00000164434.10 5.533205 -0.92141498 209.9531 6.488962e-38 3.242534e-34
ENSG00000256463.7  5.278677 -2.52333685 201.0082 1.152294e-36 4.798346e-33
ENSG00000233887.1  4.054302 -4.49118571 200.4813 3.120264e-36 1.113711e-32
ENSG00000196431.3  4.767828 -1.79054718 192.6636 1.759004e-35 5.173259e-32
ENSG00000280109.1  5.872673  0.25250281 192.4883 1.863491e-35 5.173259e-32
ENSG00000180053.7  3.349894 -4.85962856 193.4720 2.948655e-35 7.367214e-32

summary(decideTests(qlf))

Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  dims [product 99940] do not match the length of object [20]

Can anyone tell me what is this error. Did I go wrong anywhere? 

Thank you

edger r differential gene expression • 1.9k views
ADD COMMENT
0
Entering edit mode

Code looks fine to me. Check if decideTestsDGE works, in which case it may be an S3 dispatch error. If that also fails, make a minimum working example that other people can run to reproduce the error.

ADD REPLY
0
Entering edit mode

Hi Aaron,

Thanks for the reply. I used like following:

summary(decideTestsDGE(qlf))
       [,1] 
-1   949
0  22517
1   1519

It means 949 downregulated genes and 1519 upregulated genes b/w G2-G3 comparison. Am I right? And in topTags(qlf) Which are differentially expressed genes? On what basis should I consider them as differentially expressed?

And how to save that "qlf" into file?

ADD REPLY
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 19 hours ago
The city by the bay

In response to your questions in the comment:

  1. Yes.
  2. See ?topTags, with the n and p.value arguments.
  3. Apply a threshold to the FDR. For example, 0.05.
  4. Look at write.table (for text) or saveRDS (for a serialized R object).
ADD COMMENT
0
Entering edit mode

I did it in this way.

tr <- glmTreat(fit, contrast=contrast.matrix, lfc=log2(1.2))
topTags(tr)

Coefficient:  1*EG2 -1*EG3 
                                        logFC    unshrunk.logFC     logCPM       PValue          FDR
ENSG00000124232.9  7.351170       7.426909  0.86857907 4.918596e-46 1.228911e-41
ENSG00000102195.8  7.077038       7.206943 -0.11326609 1.112308e-40 1.389551e-36
ENSG00000165066.12 6.034396       6.195477 -1.42387186 9.868600e-39 8.218899e-35
ENSG00000164434.10 5.533205       5.609230 -0.92141498 1.102372e-36 6.885693e-33
ENSG00000138161.11 3.629804       3.640628  0.03880203 2.224748e-35 1.036203e-31
ENSG00000256463.7  5.278677       5.513779 -2.52333685 2.488379e-35 1.036203e-31
ENSG00000280109.1  5.872673       5.914140  0.25250281 1.854051e-34 5.885115e-31
ENSG00000233887.1  4.054302       5.198814 -4.49118571 1.884368e-34 5.885115e-31
ENSG00000196431.3  4.767828       4.859437 -1.79054718 5.811789e-34 1.613417e-30
ENSG00000180053.7  3.349894       5.164781 -4.85962856 2.589060e-33 6.468768e-30

summary(decideTestsDGE(tr))

       [,1] 
-1   537
0  23545
1    903

So from this 1440 (537 downregulated and 903 upregulated) genes are detected as Differentially expressed genes with fold-change significantly above 1.2 at an FDR cut-off of 5%. Is it right? 

I have another question. Why "decideTests" function didn't work?

Thank you

ADD REPLY
1
Entering edit mode
  1. Yes.
  2. You're probably not using the latest version of edgeR (3.18.*), in which S3 methods were added to allow decideTests to operate on various edgeR objects.
ADD REPLY
0
Entering edit mode

Thank you very much Aaron. I guess all my doubts are cleared now!

Just one last check.

tr <- glmTreat(fit, contrast=contrast.matrix, lfc=log2(1.2))
tab2 <- topTags(tr,n=Inf)

keep <- tab2$table$FDR <= 0.05

write.csv(tab2$table[keep,], file="DE.csv")

In the .csv file in total I have 1440 genes. From this all the positive logFC are upregulated genes and negative logFC are downregulated genes. 

Am I right?

Thankyou

ADD REPLY
1
Entering edit mode

Yes, that is correct.

ADD REPLY
0
Entering edit mode

Dear Aaron,

I have a doubt. With the same data I used limma, Glimma and edgeR for Differential expression (http://combine-australia.github.io/RNAseq-R/06-rnaseq-day1.html) But with the same comparison I found only one differentially expressed gene. In this process data will be transformed into Voom. Is it because of that? From few days back I'm trying in this way but couldn't find more than 1 differentially expressed gene. 

But now I used edgeR which is mentioned in tutorial from feature counts. I used above code and got more than 1000 DE genes. 

Can you please tell why there lot of difference? 

ADD REPLY
0
Entering edit mode

Ask a new question with appropriate code.

ADD REPLY

Login before adding your answer.

Traffic: 749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6