Search
Question: decideTests error with edgeR package
0
gravatar for ghk
3 months ago by
ghk0
ghk0 wrote:

Hello Everyone,

I have counts data with 366 samples as columns and more than 50k ensembl ids as rownames. The data is stored in a dataframe "U".
Data looks like following:

                                                   S1                S2                S3
ENSG00000000003.13            5745            8742            7257
ENSG00000000005.5                4                   1                  0
ENSG00000000419.11            1394            2211             615
ENSG00000000457.12             344            1166              692
ENSG00000000460.15             405             480              258
ENSG00000000938.11             144             132               110

table(df$Status)

  F2   F3   F4  NE 
125  220  20    1 

library(edgeR)

group <- factor(paste0(df$Status))
y <- DGEList(U, group=group)

keep <- rowSums(cpm(y) > 0.5) >= 2

table(keep)
summary(keep)

Mode   FALSE    TRUE    NA's 

logical   35503   24985       0 


y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y, method = "TMM")

design2 <- model.matrix(~ 0 + group)
colnames(design2) <- c("G2","G3","G4","NE")
design2

y <- estimateDisp(y, design2, robust=TRUE)

y$common.dispersion

[1] 1.030704

fit <- glmQLFit(y, design2, robust=TRUE)
head(fit$coefficients)

                                                 G2         G3             G4             NE
ENSG00000000003.13  -9.614857  -9.514155  -9.219875  -9.388728
ENSG00000000005.5  -17.008674 -16.908788 -16.166986 -16.630297
ENSG00000000419.11 -10.940426 -10.914407 -10.898501 -10.804819
ENSG00000000457.12 -11.582309 -11.493746 -11.423292 -12.203859
ENSG00000000460.15 -12.362215 -12.205290 -11.947569 -12.040664
ENSG00000000938.11 -12.514476 -12.640490 -13.160899 -13.074225

contrast.matrix <- makeContrasts(G2-G3, levels=design2)
contrast.matrix

Contrasts
Levels  G2 - G3
   EG2         1
   EG3        -1
   EG4         0
   NE           0

qlf <- glmQLFTest(fit, contrast=contrast.matrix)
topTags(qlf)

Coefficient:  1*G2 -1*G3 
                                         logFC       logCPM          F              PValue           FDR
ENSG00000124232.9  7.351170  0.86857907 278.9784 6.083121e-47 1.519868e-42
ENSG00000102195.8  7.077038 -0.11326609 236.7369 1.532441e-41 1.914402e-37
ENSG00000165066.12 6.034396 -1.42387186 224.2053 7.269869e-40 6.054590e-36
ENSG00000138161.11 3.629804  0.03880203 210.2262 5.937722e-38 3.242534e-34
ENSG00000164434.10 5.533205 -0.92141498 209.9531 6.488962e-38 3.242534e-34
ENSG00000256463.7  5.278677 -2.52333685 201.0082 1.152294e-36 4.798346e-33
ENSG00000233887.1  4.054302 -4.49118571 200.4813 3.120264e-36 1.113711e-32
ENSG00000196431.3  4.767828 -1.79054718 192.6636 1.759004e-35 5.173259e-32
ENSG00000280109.1  5.872673  0.25250281 192.4883 1.863491e-35 5.173259e-32
ENSG00000180053.7  3.349894 -4.85962856 193.4720 2.948655e-35 7.367214e-32

summary(decideTests(qlf))

Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  dims [product 99940] do not match the length of object [20]

Can anyone tell me what is this error. Did I go wrong anywhere? 

Thank you

ADD COMMENTlink modified 3 months ago by Aaron Lun17k • written 3 months ago by ghk0

Code looks fine to me. Check if decideTestsDGE works, in which case it may be an S3 dispatch error. If that also fails, make a minimum working example that other people can run to reproduce the error.

ADD REPLYlink written 3 months ago by Aaron Lun17k

Hi Aaron,

Thanks for the reply. I used like following:

summary(decideTestsDGE(qlf))
       [,1] 
-1   949
0  22517
1   1519

It means 949 downregulated genes and 1519 upregulated genes b/w G2-G3 comparison. Am I right? And in topTags(qlf) Which are differentially expressed genes? On what basis should I consider them as differentially expressed?

And how to save that "qlf" into file?

ADD REPLYlink modified 3 months ago • written 3 months ago by ghk0
0
gravatar for Aaron Lun
3 months ago by
Aaron Lun17k
Cambridge, United Kingdom
Aaron Lun17k wrote:

In response to your questions in the comment:

  1. Yes.
  2. See ?topTags, with the n and p.value arguments.
  3. Apply a threshold to the FDR. For example, 0.05.
  4. Look at write.table (for text) or saveRDS (for a serialized R object).
ADD COMMENTlink modified 3 months ago • written 3 months ago by Aaron Lun17k

I did it in this way.

tr <- glmTreat(fit, contrast=contrast.matrix, lfc=log2(1.2))
topTags(tr)

Coefficient:  1*EG2 -1*EG3 
                                        logFC    unshrunk.logFC     logCPM       PValue          FDR
ENSG00000124232.9  7.351170       7.426909  0.86857907 4.918596e-46 1.228911e-41
ENSG00000102195.8  7.077038       7.206943 -0.11326609 1.112308e-40 1.389551e-36
ENSG00000165066.12 6.034396       6.195477 -1.42387186 9.868600e-39 8.218899e-35
ENSG00000164434.10 5.533205       5.609230 -0.92141498 1.102372e-36 6.885693e-33
ENSG00000138161.11 3.629804       3.640628  0.03880203 2.224748e-35 1.036203e-31
ENSG00000256463.7  5.278677       5.513779 -2.52333685 2.488379e-35 1.036203e-31
ENSG00000280109.1  5.872673       5.914140  0.25250281 1.854051e-34 5.885115e-31
ENSG00000233887.1  4.054302       5.198814 -4.49118571 1.884368e-34 5.885115e-31
ENSG00000196431.3  4.767828       4.859437 -1.79054718 5.811789e-34 1.613417e-30
ENSG00000180053.7  3.349894       5.164781 -4.85962856 2.589060e-33 6.468768e-30

summary(decideTestsDGE(tr))

       [,1] 
-1   537
0  23545
1    903

So from this 1440 (537 downregulated and 903 upregulated) genes are detected as Differentially expressed genes with fold-change significantly above 1.2 at an FDR cut-off of 5%. Is it right? 

I have another question. Why "decideTests" function didn't work?

Thank you

ADD REPLYlink written 3 months ago by ghk0
1
  1. Yes.
  2. You're probably not using the latest version of edgeR (3.18.*), in which S3 methods were added to allow decideTests to operate on various edgeR objects.
ADD REPLYlink written 3 months ago by Aaron Lun17k

Thank you very much Aaron. I guess all my doubts are cleared now!

Just one last check.

tr <- glmTreat(fit, contrast=contrast.matrix, lfc=log2(1.2))
tab2 <- topTags(tr,n=Inf)

keep <- tab2$table$FDR <= 0.05

write.csv(tab2$table[keep,], file="DE.csv")

In the .csv file in total I have 1440 genes. From this all the positive logFC are upregulated genes and negative logFC are downregulated genes. 

Am I right?

Thankyou

ADD REPLYlink written 3 months ago by ghk0
1

Yes, that is correct.

ADD REPLYlink written 3 months ago by Aaron Lun17k

Dear Aaron,

I have a doubt. With the same data I used limma, Glimma and edgeR for Differential expression (http://combine-australia.github.io/RNAseq-R/06-rnaseq-day1.html) But with the same comparison I found only one differentially expressed gene. In this process data will be transformed into Voom. Is it because of that? From few days back I'm trying in this way but couldn't find more than 1 differentially expressed gene. 

But now I used edgeR which is mentioned in tutorial from feature counts. I used above code and got more than 1000 DE genes. 

Can you please tell why there lot of difference? 

ADD REPLYlink written 3 months ago by ghk0

Ask a new question with appropriate code.

ADD REPLYlink written 3 months ago by Aaron Lun17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 239 users visited in the last hour