Extracting genes within venn diagram

0

Entering edit mode

Celine Carret ▴ 220

@celine-carret-1477

Last seen 11.4 years ago

Hi all, I have a venn diagram obtained from Limma using decideTests() default and I would like to extract the gene lists specific for each contrasts. Here is what I did: >matrix <- read.csv("cells-HumanDB.csv") ##columns 1 and 2 are gene names and uniprot labels, numerical values start on column 3 >design <- model.matrix(~ -1+factor(c(1,1,1,2,2,2,3,3,4,4,4,5,5,5,6,6,6))) >colnames(design) <- c("GFPneg12h", "GFPpos12h","GFPneg2h","GFPpos2h","DEXpos2h","NI") >contrast.matrix <- makeContrasts((GFPpos12h-GFPneg12h)-NI,(GFPpos2h- GFPneg2h)-NI,(DEXpos2h- GFPneg2h)-NI,((GFPpos2h-GFPneg2h)-NI)-((DEXpos2h- GFPneg2h)-NI),((GFPpos1 2h-GFPneg12h)-NI)-((GFPpos2h-GFPneg2h)-NI), levels=design) >fit <- lmFit(matrix[,3:19], design,weights=arrayw) >fit2 <- contrasts.fit(fit, contrast.matrix) >fit2 <- eBayes(fit2) >results <- decideTests(fit2) >vennDiagram(results[,2:3],cex=0.7) and I get as numbers: 43 (682) 58 and outside circles 712 Looking at the mail list archive I found a line of code from Gordon to extract gene list within a venn diagram such as: >alls <- apply(results[,2:3],1,all) There were 50 or more warnings (use warnings() to see the first 50) > fit2$genes[alls,] And I get 682 genes corresponding to the intersection >alls <- apply(!results[,2:3],1,all) > fit2$genes[alls,] Gave me 712 genes that are outside the venn diagram But how can I get the genes specific for either contrast? (so in this case 43 genes on one hand and 58 genes on another) I also found from Jim this code using affycoretools functions but it doesn't give me the right numbers and I can't understand why: > alls <- affycoretools:::makeIndices(results[,2:3]) > alls.genes <- lapply(alls, function(x) row.names(results[,2:3])[x]) > alls.genes [[1]] gives 56 genes [[2]] gives 71 genes [[3]] gives 669 genes Shouldn't I get [[1]] 43; [[2]] 58 and [[3]] 682? I will be grateful for any help or advice on this matter. Best wishes Celine Here is my R session info > sessionInfo() R version 2.8.0 (2008-10-20) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] affycoretools_1.14.0 annaffy_1.14.0 KEGG.db_2.2.5 gcrma_2.14.1 matchprobes_1.14.0 [6] biomaRt_1.16.0 GOstats_2.8.0 Category_2.8.0 genefilter_1.22.0 survival_2.34-1 [11] RBGL_1.18.0 annotate_1.20.0 xtable_1.5-4 GO.db_2.2.5 RSQLite_0.7-1 [16] DBI_0.2-4 AnnotationDbi_1.4.0 graph_1.20.0 affy_1.20.0 limma_2.16.2 [21] Biobase_2.2.0 loaded via a namespace (and not attached): [1] affyio_1.10.1 cluster_1.11.11 GSEABase_1.4.0 preprocessCore_1.4.0 RCurl_0.91-0 [6] XML_1.94-0.1 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

limma affycoretools limma affycoretools • 5.1k views

ADD COMMENT • link 17.2 years ago Celine Carret ▴ 220

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 3 days ago

United States

Hi Celine, Celine Carret wrote: > Hi all, > > I have a venn diagram obtained from Limma using decideTests() default > and I would like to extract the gene lists specific for each contrasts. > > Here is what I did: > >> matrix <- read.csv("cells-HumanDB.csv") ##columns 1 and 2 are gene > names and uniprot labels, numerical values start on column 3 > >> design <- model.matrix(~ > -1+factor(c(1,1,1,2,2,2,3,3,4,4,4,5,5,5,6,6,6))) >> colnames(design) <- c("GFPneg12h", > "GFPpos12h","GFPneg2h","GFPpos2h","DEXpos2h","NI") >> contrast.matrix <- > makeContrasts((GFPpos12h-GFPneg12h)-NI,(GFPpos2h- GFPneg2h)-NI,(DEXpos2h- > GFPneg2h)-NI,((GFPpos2h-GFPneg2h)-NI)-((DEXpos2h- GFPneg2h)-NI),((GFPpos1 > 2h-GFPneg12h)-NI)-((GFPpos2h-GFPneg2h)-NI), levels=design) >> fit <- lmFit(matrix[,3:19], design,weights=arrayw) >> fit2 <- contrasts.fit(fit, contrast.matrix) >> fit2 <- eBayes(fit2) >> results <- decideTests(fit2) >> vennDiagram(results[,2:3],cex=0.7) > > and I get as numbers: 43 (682) 58 > and outside circles 712 > > Looking at the mail list archive I found a line of code from Gordon to > extract gene list within a venn diagram such as: > >> alls <- apply(results[,2:3],1,all) > There were 50 or more warnings (use warnings() to see the first 50) >> fit2$genes[alls,] > And I get 682 genes corresponding to the intersection > >> alls <- apply(!results[,2:3],1,all) >> fit2$genes[alls,] > Gave me 712 genes that are outside the venn diagram > > But how can I get the genes specific for either contrast? (so in this > case 43 genes on one hand and 58 genes on another) > > I also found from Jim this code using affycoretools functions but it > doesn't give me the right numbers and I can't understand why: > >> alls <- affycoretools:::makeIndices(results[,2:3]) Most likely what you want here is alls <- affycoretools:::makeIndices(results[,2:3], "both") as the default for vennCounts() is "both" (and you have implicitly called vennCounts() in your call to vennDiagram()). Best, Jim >> alls.genes <- lapply(alls, function(x) row.names(results[,2:3])[x]) >> alls.genes > [[1]] gives 56 genes > [[2]] gives 71 genes > [[3]] gives 669 genes > Shouldn't I get [[1]] 43; [[2]] 58 and [[3]] 682? > > I will be grateful for any help or advice on this matter. > Best wishes > Celine > > Here is my R session info >> sessionInfo() > R version 2.8.0 (2008-10-20) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United > Kingdom.1252;LC_MONETARY=English_United > Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] splines tools stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] affycoretools_1.14.0 annaffy_1.14.0 KEGG.db_2.2.5 > gcrma_2.14.1 matchprobes_1.14.0 > [6] biomaRt_1.16.0 GOstats_2.8.0 Category_2.8.0 > genefilter_1.22.0 survival_2.34-1 > [11] RBGL_1.18.0 annotate_1.20.0 xtable_1.5-4 > GO.db_2.2.5 RSQLite_0.7-1 > [16] DBI_0.2-4 AnnotationDbi_1.4.0 graph_1.20.0 > affy_1.20.0 limma_2.16.2 > [21] Biobase_2.2.0 > > loaded via a namespace (and not attached): > [1] affyio_1.10.1 cluster_1.11.11 GSEABase_1.4.0 > preprocessCore_1.4.0 RCurl_0.91-0 > [6] XML_1.94-0.1 > > -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662

ADD COMMENT • link 17.2 years ago James W. MacDonald 68k

0

Entering edit mode

Hi Celine, There's also the old-fashion way of subsetting by hand. The object of class "TestResults" created by decideTests() is simply a matrix of -1, 0, and 1, so you can use == to find the genes you are interested in: results <- decideTests(fit2) fit2$genes[results[,2] != 0 & results[,3] == 0 , ] # This should give you the 43 genes fit2$genes[results[,2] == 0 & results[,3] != 0 ,] # This should give you the 58 genes Subsetting this way will give you exactly what you want, although it can get a bit complicated once you start getting into 3 contrasts or more. I'll remember to use the affycoretools:::makeIndices in the future! >Most likely what you want here is > >alls <- affycoretools:::makeIndices(results[,2:3], "both") > >as the default for vennCounts() is "both" (and you have implicitly >called vennCounts() in your call to vennDiagram()). What Jim is referring to here is that the default of "both" counts a gene in both lists even if it is significantly up in one list but significantly down in the other list. Jim thought it makes more sense to only count a gene in both lists if it is changing in the same direction, and so wrote functions in affycoretools that have "same" as the default rather than "both". This explains the difference in the numbers. Personally, the genes that change in opposite directions might even be more interesting, and so I use a combination of "both" and "same" to see if there are any. Cheers, Jenny >Best, > >Jim > > >>>alls.genes <- lapply(alls, function(x) row.names(results[,2:3])[x]) >>>alls.genes >>[[1]] gives 56 genes >>[[2]] gives 71 genes >>[[3]] gives 669 genes >>Shouldn't I get [[1]] 43; [[2]] 58 and [[3]] 682? >>I will be grateful for any help or advice on this matter. >>Best wishes >>Celine >>Here is my R session info >>>sessionInfo() >>R version 2.8.0 (2008-10-20) i386-pc-mingw32 >>locale: >>LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United >>Kingdom.1252;LC_MONETARY=English_United >>Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 >>attached base packages: >>[1] splines tools stats graphics grDevices utils datasets >>methods base >>other attached packages: >> [1] affycoretools_1.14.0 annaffy_1.14.0 KEGG.db_2.2.5 >>gcrma_2.14.1 matchprobes_1.14.0 >> [6] biomaRt_1.16.0 GOstats_2.8.0 Category_2.8.0 >>genefilter_1.22.0 survival_2.34-1 >>[11] RBGL_1.18.0 annotate_1.20.0 xtable_1.5-4 >>GO.db_2.2.5 RSQLite_0.7-1 >>[16] DBI_0.2-4 AnnotationDbi_1.4.0 graph_1.20.0 >>affy_1.20.0 limma_2.16.2 >>[21] Biobase_2.2.0 >>loaded via a namespace (and not attached): >>[1] affyio_1.10.1 cluster_1.11.11 GSEABase_1.4.0 >>preprocessCore_1.4.0 RCurl_0.91-0 >>[6] XML_1.94-0.1 > >-- >James W. MacDonald, M.S. >Biostatistician >Hildebrandt Lab >8220D MSRB III >1150 W. Medical Center Drive >Ann Arbor MI 48109-0646 >734-936-8662 > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu

ADD REPLY • link 17.2 years ago Jenny Drnevich ★ 2.0k

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 3 days ago

United States

Hi Celine, Celine Carret wrote: > Hi all, > > I have a venn diagram obtained from Limma using decideTests() default > and I would like to extract the gene lists specific for each contrasts. > > Here is what I did: > >> matrix <- read.csv("cells-HumanDB.csv") ##columns 1 and 2 are gene > names and uniprot labels, numerical values start on column 3 > >> design <- model.matrix(~ > -1+factor(c(1,1,1,2,2,2,3,3,4,4,4,5,5,5,6,6,6))) >> colnames(design) <- c("GFPneg12h", > "GFPpos12h","GFPneg2h","GFPpos2h","DEXpos2h","NI") >> contrast.matrix <- > makeContrasts((GFPpos12h-GFPneg12h)-NI,(GFPpos2h- GFPneg2h)-NI,(DEXpos2h- > GFPneg2h)-NI,((GFPpos2h-GFPneg2h)-NI)-((DEXpos2h- GFPneg2h)-NI),((GFPpos1 > 2h-GFPneg12h)-NI)-((GFPpos2h-GFPneg2h)-NI), levels=design) >> fit <- lmFit(matrix[,3:19], design,weights=arrayw) >> fit2 <- contrasts.fit(fit, contrast.matrix) >> fit2 <- eBayes(fit2) >> results <- decideTests(fit2) >> vennDiagram(results[,2:3],cex=0.7) > > and I get as numbers: 43 (682) 58 > and outside circles 712 > > Looking at the mail list archive I found a line of code from Gordon to > extract gene list within a venn diagram such as: > >> alls <- apply(results[,2:3],1,all) > There were 50 or more warnings (use warnings() to see the first 50) >> fit2$genes[alls,] > And I get 682 genes corresponding to the intersection > >> alls <- apply(!results[,2:3],1,all) >> fit2$genes[alls,] > Gave me 712 genes that are outside the venn diagram > > But how can I get the genes specific for either contrast? (so in this > case 43 genes on one hand and 58 genes on another) > > I also found from Jim this code using affycoretools functions but it > doesn't give me the right numbers and I can't understand why: > >> alls <- affycoretools:::makeIndices(results[,2:3]) Most likely what you want here is alls <- affycoretools:::makeIndices(results[,2:3], "both") as the default for vennCounts() is "both" (and you have implicitly called vennCounts() in your call to vennDiagram()). Best, Jim >> alls.genes <- lapply(alls, function(x) row.names(results[,2:3])[x]) >> alls.genes > [[1]] gives 56 genes > [[2]] gives 71 genes > [[3]] gives 669 genes > Shouldn't I get [[1]] 43; [[2]] 58 and [[3]] 682? > > I will be grateful for any help or advice on this matter. > Best wishes > Celine > > Here is my R session info >> sessionInfo() > R version 2.8.0 (2008-10-20) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United > Kingdom.1252;LC_MONETARY=English_United > Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] splines tools stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] affycoretools_1.14.0 annaffy_1.14.0 KEGG.db_2.2.5 > gcrma_2.14.1 matchprobes_1.14.0 > [6] biomaRt_1.16.0 GOstats_2.8.0 Category_2.8.0 > genefilter_1.22.0 survival_2.34-1 > [11] RBGL_1.18.0 annotate_1.20.0 xtable_1.5-4 > GO.db_2.2.5 RSQLite_0.7-1 > [16] DBI_0.2-4 AnnotationDbi_1.4.0 graph_1.20.0 > affy_1.20.0 limma_2.16.2 > [21] Biobase_2.2.0 > > loaded via a namespace (and not attached): > [1] affyio_1.10.1 cluster_1.11.11 GSEABase_1.4.0 > preprocessCore_1.4.0 RCurl_0.91-0 > [6] XML_1.94-0.1 > > -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662

ADD COMMENT • link 17.2 years ago James W. MacDonald 68k

0

Entering edit mode

Celine Carret ▴ 220

@celine-carret-1477

Last seen 11.4 years ago

Thank you all for your advices and explanations It's clear now. Best wishes celine -----Original Message----- From: Jenny Drnevich [mailto:drnevich@illinois.edu] Sent: 06 November 2008 16:01 To: James W. MacDonald; Celine Carret Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Extracting genes within venn diagram Hi Celine, There's also the old-fashion way of subsetting by hand. The object of class "TestResults" created by decideTests() is simply a matrix of -1, 0, and 1, so you can use == to find the genes you are interested in: results <- decideTests(fit2) fit2$genes[results[,2] != 0 & results[,3] == 0 , ] # This should give you the 43 genes fit2$genes[results[,2] == 0 & results[,3] != 0 ,] # This should give you the 58 genes Subsetting this way will give you exactly what you want, although it can get a bit complicated once you start getting into 3 contrasts or more. I'll remember to use the affycoretools:::makeIndices in the future! >Most likely what you want here is > >alls <- affycoretools:::makeIndices(results[,2:3], "both") > >as the default for vennCounts() is "both" (and you have implicitly >called vennCounts() in your call to vennDiagram()). What Jim is referring to here is that the default of "both" counts a gene in both lists even if it is significantly up in one list but significantly down in the other list. Jim thought it makes more sense to only count a gene in both lists if it is changing in the same direction, and so wrote functions in affycoretools that have "same" as the default rather than "both". This explains the difference in the numbers. Personally, the genes that change in opposite directions might even be more interesting, and so I use a combination of "both" and "same" to see if there are any. Cheers, Jenny >Best, > >Jim > > >>>alls.genes <- lapply(alls, function(x) row.names(results[,2:3])[x]) >>>alls.genes >>[[1]] gives 56 genes >>[[2]] gives 71 genes >>[[3]] gives 669 genes >>Shouldn't I get [[1]] 43; [[2]] 58 and [[3]] 682? >>I will be grateful for any help or advice on this matter. >>Best wishes >>Celine >>Here is my R session info >>>sessionInfo() >>R version 2.8.0 (2008-10-20) i386-pc-mingw32 >>locale: >>LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United >>Kingdom.1252;LC_MONETARY=English_United >>Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 >>attached base packages: >>[1] splines tools stats graphics grDevices utils datasets >>methods base >>other attached packages: >> [1] affycoretools_1.14.0 annaffy_1.14.0 KEGG.db_2.2.5 >>gcrma_2.14.1 matchprobes_1.14.0 >> [6] biomaRt_1.16.0 GOstats_2.8.0 Category_2.8.0 >>genefilter_1.22.0 survival_2.34-1 >>[11] RBGL_1.18.0 annotate_1.20.0 xtable_1.5-4 >>GO.db_2.2.5 RSQLite_0.7-1 >>[16] DBI_0.2-4 AnnotationDbi_1.4.0 graph_1.20.0 >>affy_1.20.0 limma_2.16.2 >>[21] Biobase_2.2.0 >>loaded via a namespace (and not attached): >>[1] affyio_1.10.1 cluster_1.11.11 GSEABase_1.4.0 >>preprocessCore_1.4.0 RCurl_0.91-0 >>[6] XML_1.94-0.1 > >-- >James W. MacDonald, M.S. >Biostatistician >Hildebrandt Lab >8220D MSRB III >1150 W. Medical Center Drive >Ann Arbor MI 48109-0646 >734-936-8662 > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu No virus found in this incoming message. Checked by AVG - http://www.avg.com 05/11/2008 17:36 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

ADD COMMENT • link 17.2 years ago Celine Carret ▴ 220

Login before adding your answer.