EdgeR questions about exactTest and plotMDS function
2
0
Entering edit mode
@amos-kirilovsky-5407
Last seen 7.6 years ago
Dear Bioconductor list, I'm try to get differentially expressed genes from two groups of samples using EdgeR. In the first group I have only 1 sample and in the second there are 3 biological replicates. (I will get more samples per group in the future but for now I have to make an analysis pipeline with only that 4 samples) The sample came from a metatranscriptomic experience. For the comparison I selected only the genes that match to a species of interest. I got error messages doing differential expression test and MDS : 1) *Differential expression*: > et <- exactTest(d) Error in `[.data.frame`(mu1, g) : undefined columns selected or (only if the name of my first group is starting with the letter A !) > et <- exactTest(d) Error in 0:s1[g] : argument NA / NaN I found out that my error message is due to the presence of only one sample in my first group when using exactTest() function. I used glmLRT() function and I obtained many significantly differentially expressed genes. Could you confirm that exactTest() don't work with group of only one sample and that I used correctly glmLRT() ? My code is at the end of the message. 2) *MDS* > plotMDS(d) Error in is.finite(x$counts) : default method unavailable for 'list' type (translated from French) For the second problem I made some tests with the first Case study ("SAGE profiles of normal and tumour tissue") in edgeR user Guide. My code was very similar and this time it worked fine. I couldn't find out why. Do you have any idea? I copy the output of my DGEList object after my message. I don't know if this is helpfull but the function plotBCV() seems ok with my data. During my tests with the first Case study I get different results when doing plotMDS(d) and plotMDS(d$counts). The axis length and the sample position were different. Is it expected? why such a difference? I guess it has to do with the gene.selection option. I hope I was clear in my explanations. Many thanks in advance. Amos Kirilovsky -- Dr Amos Kirilovsky CEA Institut de génomique 2 rue Gaston Cremieux CP 5706 91057 Evry cedex - France Tel: (33) 01 60 87 25 35 #################################################################### > *d* An object of class "DGEList" $samples files group lib.size norm.factors s1 s1 E 271842 0.9414223 s2 s2 b 1310877 1.0216270 s3 s3 b 669731 1.0476370 s4 s4 b 1184754 0.9924584 $counts s1 s2 s3 s4 804851 0 45 19 49 587117 0 27 12 30 427810 0 16 6 10 367020 0 8 16 20 980437 0 2 8 14 14236 more rows ... $common.dispersion [1] 0.4296784 $pseudo.counts s1 s2 s3 s4 804851 0 24.4565653 19.756230 30.425129 587117 0 14.6478020 12.480244 18.638575 427810 0 8.7504091 6.242240 6.135612 367020 0 4.1452733 16.616435 12.407833 980437 0 0.8736513 8.306954 8.780709 14236 more rows ... $logCPM [1] 4.803775 4.158376 3.185720 3.723672 2.957607 14236 more elements ... $pseudo.lib.size [1] 729208.6 $prior.n [1] 10 $tagwise.dispersion [1] 0.3771734 0.3789361 0.3849697 0.4188742 0.4511627 14236 more elements #################################################################### #################################################################### *#script* library(edgeR) #read file filinName = "metaTranscriptome" filin = read.table(filinName, sep ="\t", header = TRUE) x <- list() sets <- c("s1","s2","s3","s4") x$samples <- data.frame(files=as.character(sets),stringsAsFactors=FALSE) x$samples$group <- factor(c("A","B","B","B")) # read count matrix x$counts = filin[,2:5] #Attention s'il y a des NA ça va bugger rownames(x$counts) <- filin[,1] colnames(x$counts) <- sets # tot read number by sample x$samples$lib.size <- colSums(x$counts) # DGEList x$samples$norm.factors <- 1 row.names(x$samples) <- colnames(x$counts) d <- new("DGEList",x) #creation de DGElist à partir de x # normalisation factor d <- calcNormFactors(d) # TMM summary(d$counts) ################ Not working ################### plotMDS(d) ################################################ # estimate common dispersion d <- estimateCommonDisp(d, verbose=TRUE) # estimate Tagwise Disp d <- estimateTagwiseDisp(d, trend="none") # plots the tagwise dispersions against log2-CPM # plotBCV(d, cex=0.4) ################ Not working ################### # Differential expression # et <- exactTest(d) # topTags(et) ################################################ # matrix design design <- model.matrix(~0+group, data=d$samples) colnames(design) <- levels(d$samples$group) design # Differential expression fit <- glmFit(d,design) lrt <- glmLRT(fit, contrast=c(1,-1)) topTags(lrt) #################################################################### #################################################################### sessionInfo() R version 2.15.3 (2013-03-01) Platform: x86_64-pc-linux-gnu (64-bit) other attached packages: [1] edgeR_3.0.8 limma_3.14.4 [[alternative HTML version deleted]]
edgeR edgeR • 4.0k views
ADD COMMENT
0
Entering edit mode
Mark Robinson ▴ 880
@mark-robinson-4908
Last seen 6.2 years ago
Hi Amos, > 1) *Differential expression*: >> et <- exactTest(d) > Error in `[.data.frame`(mu1, g) : undefined columns selected > > or (only if the name of my first group is starting with the letter A !) >> et <- exactTest(d) > Error in 0:s1[g] : argument NA / NaN > > I found out that my error message is due to the presence of only one sample > in my first group when using exactTest() function. I used glmLRT() function > and I obtained many significantly differentially expressed genes. > Could you confirm that exactTest() don't work with group of only one sample > and that I used correctly glmLRT() ? My code is at the end of the message. I can't reproduce this error on a toy problem (either on edgeR 3.0.8 or on the latest 3.2.3): library(edgeR) y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4) d <- DGEList(counts=y, group=rep(1:2,c(1,3))) d <- estimateCommonDisp(d, verbose=TRUE) d <- estimateTagwiseDisp(d, trend="none") et <- exactTest(d) I'm afraid I can't tell from your code what the issue is, but it is a bit unusual to construct the 'DGEList' object manually that way you do. Why not use the DGEList() constructor? > During my tests with the first Case study I get different results when > doing plotMDS(d) and plotMDS(d$counts). > The axis length and the sample position were different. > Is it expected? why such a difference? I guess it has to do with the > gene.selection option. This is expected. plotMDS(x) is different when x is a DGEList object from when x is a matrix. See ?plotMDS.DGEList for the former and ?plotMDS for the latter. You probably want a count-based MDS plot, i.e. plotMDS(x), whether that be in the method='logFC' or method='bcv' flavour. Read the docs for more details. Best, Mark ---------- Prof. Dr. Mark Robinson Bioinformatics, Institute of Molecular Life Sciences University of Zurich http://tiny.cc/mrobin On 27.05.2013, at 17:36, Amos Kirilovsky <amos.kirilovsky at="" gmail.com=""> wrote: > Dear Bioconductor list, > > I'm try to get differentially expressed genes from two groups of samples > using EdgeR. > In the first group I have only 1 sample and in the second there are 3 > biological replicates. (I will get more samples per group in the future but > for now I have to make an analysis pipeline with only that 4 samples) > The sample came from a metatranscriptomic experience. > For the comparison I selected only the genes that match to a species of > interest. > > I got error messages doing differential expression test and MDS : > > 1) *Differential expression*: >> et <- exactTest(d) > Error in `[.data.frame`(mu1, g) : undefined columns selected > > or (only if the name of my first group is starting with the letter A !) >> et <- exactTest(d) > Error in 0:s1[g] : argument NA / NaN > > I found out that my error message is due to the presence of only one sample > in my first group when using exactTest() function. I used glmLRT() function > and I obtained many significantly differentially expressed genes. > Could you confirm that exactTest() don't work with group of only one sample > and that I used correctly glmLRT() ? My code is at the end of the message. > > > 2) *MDS* >> plotMDS(d) > Error in is.finite(x$counts) : > default method unavailable for 'list' type (translated from French) > > For the second problem I made some tests with the first Case study ("SAGE > profiles of normal and tumour tissue") in edgeR user Guide. > My code was very similar and this time it worked fine. I couldn't find out > why. Do you have any idea? > I copy the output of my DGEList object after my message. I don't know if > this is helpfull but the function plotBCV() seems ok with my data. > > During my tests with the first Case study I get different results when > doing plotMDS(d) and plotMDS(d$counts). > The axis length and the sample position were different. > Is it expected? why such a difference? I guess it has to do with the > gene.selection option. > > > I hope I was clear in my explanations. > Many thanks in advance. > > Amos Kirilovsky > -- > Dr Amos Kirilovsky > CEA > Institut de g?nomique > 2 rue Gaston Cremieux > CP 5706 91057 Evry cedex - France > Tel: (33) 01 60 87 25 35 > > #################################################################### >> *d* > An object of class "DGEList" > $samples > files group lib.size norm.factors > s1 s1 E 271842 0.9414223 > s2 s2 b 1310877 1.0216270 > s3 s3 b 669731 1.0476370 > s4 s4 b 1184754 0.9924584 > > $counts > s1 s2 s3 s4 > 804851 0 45 19 49 > 587117 0 27 12 30 > 427810 0 16 6 10 > 367020 0 8 16 20 > 980437 0 2 8 14 > 14236 more rows ... > > $common.dispersion > [1] 0.4296784 > > $pseudo.counts > s1 s2 s3 s4 > 804851 0 24.4565653 19.756230 30.425129 > 587117 0 14.6478020 12.480244 18.638575 > 427810 0 8.7504091 6.242240 6.135612 > 367020 0 4.1452733 16.616435 12.407833 > 980437 0 0.8736513 8.306954 8.780709 > 14236 more rows ... > > $logCPM > [1] 4.803775 4.158376 3.185720 3.723672 2.957607 > 14236 more elements ... > > $pseudo.lib.size > [1] 729208.6 > > $prior.n > [1] 10 > > $tagwise.dispersion > [1] 0.3771734 0.3789361 0.3849697 0.4188742 0.4511627 > 14236 more elements > > > #################################################################### > #################################################################### > > *#script* > library(edgeR) > > #read file > filinName = "metaTranscriptome" > filin = read.table(filinName, sep ="\t", header = TRUE) > > x <- list() > sets <- c("s1","s2","s3","s4") > x$samples <- data.frame(files=as.character(sets),stringsAsFactors=FALSE) > x$samples$group <- factor(c("A","B","B","B")) > > # read count matrix > x$counts = filin[,2:5] #Attention s'il y a des NA ?a va bugger > rownames(x$counts) <- filin[,1] > colnames(x$counts) <- sets > > # tot read number by sample > x$samples$lib.size <- colSums(x$counts) > > # DGEList > x$samples$norm.factors <- 1 > row.names(x$samples) <- colnames(x$counts) > d <- new("DGEList",x) #creation de DGElist ? partir de x > > # normalisation factor > d <- calcNormFactors(d) # TMM > > summary(d$counts) > > ################ Not working ################### > plotMDS(d) > ################################################ > > # estimate common dispersion > d <- estimateCommonDisp(d, verbose=TRUE) > > # estimate Tagwise Disp > d <- estimateTagwiseDisp(d, trend="none") > > # plots the tagwise dispersions against log2-CPM > # plotBCV(d, cex=0.4) > > ################ Not working ################### > # Differential expression > # et <- exactTest(d) > # topTags(et) > ################################################ > > # matrix design > design <- model.matrix(~0+group, data=d$samples) > colnames(design) <- levels(d$samples$group) > design > > # Differential expression > fit <- glmFit(d,design) > lrt <- glmLRT(fit, contrast=c(1,-1)) > topTags(lrt) > > #################################################################### > #################################################################### > > sessionInfo() > > R version 2.15.3 (2013-03-01) > Platform: x86_64-pc-linux-gnu (64-bit) > > other attached packages: > [1] edgeR_3.0.8 limma_3.14.4 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 16 hours ago
WEHI, Melbourne, Australia

Dear Amos,

This is a quick appendix to Mark's comments.  The plotMDS() error is occuring because your d$counts is a data.frame instead of a matrix.  I will add a check for this in the plotMDS code.

Had you used

d <- DGEList(counts=filin[,2:5])

(see Mark's comments) or

d$counts <- as.matrix(filin[,2:5])

then the error wouldn't have occured.

Also, please upgrade to the current version.

Best wishes
Gordon

ADD COMMENT
0
Entering edit mode
Dear Gordon and Mark, Thank you very much for your answers. You were right. All my problems came from the way I construct my DGElist object. I used DGEList() constructor and It worked perfectly. I first constructed a DGEList object manually because looking how to test the first Case study I found a mail from Davis McCarthy ( https://stat.ethz.ch/pipermail/bioc-sig- sequencing/2011-August/002180.html) describing a way to manipulate and create a DGEList object. It worked fine so I used it for my own data. I now see some mistakes I did. Best wishes, Amos -- Dr Amos Kirilovsky CEA Institut de génomique 2 rue Gaston Cremieux CP 5706 91057 Evry cedex - France Tel: (33) 01 60 87 25 35 [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 353 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6