Search
Question: What are the best packages to compare multiple DE gene lists?
0
4.3 years ago by
I have full genome/exome lists of DE resulting from MA and/or RNASeq analyses using multiple methods (likely showing different gene even from the same samples due to technology biases). I would like to rank these lists to create a general list where redundant DE targets are pushed up and unique hits ranked lower. What method/package should I start with? Thanks Stephane Plaisance stephane.plaisance at vib.be [[alternative HTML version deleted]]
modified 4.3 years ago by James W. MacDonald48k • written 4.3 years ago by Stephane Plaisance | VIB |60
0
4.3 years ago by
United States
James W. MacDonald48k wrote:
Hi Stephane, If I understand you correctly, you have already made comparisons and now simply want to rank genes based on the number of comparisons in which they were found significant. I don't know of a particular package for doing this, and it would be really easy to do using functions in base R. All you would need to do (assuming you have some consistent identifier like Entrez Gene IDs for each comparison), would be to concatenate all the IDs into a single vector, and then count occurences: mybigvec <- c(<all the="" de="" gene="" ids="" go="" here="">) mylst <- split(mybigvec, mybigvec) df <- data.frame(ID=names(mylst), count=sapply(mylist, length)) df <- df[order(df$count, decreasing = TRUE),] You could also take things like gene symbols along for the ride by starting with a data.frame: mybigdf <- data.frame(symbols = <concatenate symbols="" from="" all="" comps="">, geneid = <concatenate gene="" ids="" from="" all="" comps="">) mylst <- split(mybigdf, mybigdf$geneid) df <- data.frame(ID = names(mylst), count = sapply(mylst, nrow), symbol = sapply(mylst, function(x) x$symbol[1])) df <- df[order(df$count, decreasing = TRUE),] Best, Jim On Wed, Aug 27, 2014 at 6:48 AM, Stephane Plaisance | VIB | < stephane.plaisance at vib.be> wrote: > I have full genome/exome lists of DE resulting from MA and/or RNASeq > analyses using multiple methods (likely showing different gene even from > the same samples due to technology biases). I would like to rank these > lists to create a general list where redundant DE targets are pushed up and > unique hits ranked lower. > > What method/package should I start with? > > Thanks > > Stephane Plaisance > stephane.plaisance at vib.be > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]
Dear Jim, Thanks very much for this straightforward approach. I will certainly try it. My aim is to also take into account the pvalues and if applicable also the related log-FC values attached to each gene so that more than just ranking is used. I know of biotools (endeavour) that ranks lists of apples and peers and use specific methods but have no idea where exact to start. Thanks anyway for the help and code. So far I have found in the Bioc pages: matchbox Orderedlist geneselector rankrank I have tried none so if anybody has preferences, I am all ears. Cheers Stephane Plaisance stephane.plaisance at vib.be On 27 Aug 2014, at 16:29, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Stephane, > > If I understand you correctly, you have already made comparisons and now simply want to rank genes based on the number of comparisons in which they were found significant. I don't know of a particular package for doing this, and it would be really easy to do using functions in base R. All you would need to do (assuming you have some consistent identifier like Entrez Gene IDs for each comparison), would be to concatenate all the IDs into a single vector, and then count occurences: > > mybigvec <- c(<all the="" de="" gene="" ids="" go="" here="">) > mylst <- split(mybigvec, mybigvec) > df <- data.frame(ID=names(mylst), count=sapply(mylist, length)) > df <- df[order(df$count, decreasing = TRUE),] > > You could also take things like gene symbols along for the ride by starting with a data.frame: > > mybigdf <- data.frame(symbols = <concatenate symbols="" from="" all="" comps="">, geneid = <concatenate gene="" ids="" from="" all="" comps="">) > mylst <- split(mybigdf, mybigdf$geneid) > df <- data.frame(ID = names(mylst), count = sapply(mylst, nrow), symbol = sapply(mylst, function(x) x$symbol[1])) > df <- df[order(df$count, decreasing = TRUE),] > > Best, > > Jim > > > > > On Wed, Aug 27, 2014 at 6:48 AM, Stephane Plaisance | VIB | <stephane.plaisance at="" vib.be=""> wrote: > I have full genome/exome lists of DE resulting from MA and/or RNASeq analyses using multiple methods (likely showing different gene even from the same samples due to technology biases). I would like to rank these lists to create a general list where redundant DE targets are pushed up and unique hits ranked lower. > > What method/package should I start with? > > Thanks > > Stephane Plaisance > stephane.plaisance at vib.be > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 [[alternative HTML version deleted]]
Dear Stephane, If I understood well what you need, you could use RankProd package that uses rank product non parametric approach to give a p-value to the genes across different studies based on the ranking they achieve by log2FC. It permits to make such "meta-analyses" comparing lists of genes produced in different analysis. Jose 2014-08-28 9:03 GMT+02:00 Stephane Plaisance | VIB | < stephane.plaisance at vib.be>: > Dear Jim, > > Thanks very much for this straightforward approach. I will certainly try > it. My aim is to also take into account the pvalues and if applicable also > the related log-FC values attached to each gene so that more than just > ranking is used. I know of biotools (endeavour) that ranks lists of apples > and peers and use specific methods but have no idea where exact to start. > > Thanks anyway for the help and code. > > So far I have found in the Bioc pages: > matchbox > Orderedlist > geneselector > rankrank > > I have tried none so if anybody has preferences, I am all ears. > > Cheers > Stephane Plaisance > stephane.plaisance at vib.be > > > > > > On 27 Aug 2014, at 16:29, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > > > Hi Stephane, > > > > If I understand you correctly, you have already made comparisons and now > simply want to rank genes based on the number of comparisons in which they > were found significant. I don't know of a particular package for doing > this, and it would be really easy to do using functions in base R. All you > would need to do (assuming you have some consistent identifier like Entrez > Gene IDs for each comparison), would be to concatenate all the IDs into a > single vector, and then count occurences: > > > > mybigvec <- c(<all the="" de="" gene="" ids="" go="" here="">) > > mylst <- split(mybigvec, mybigvec) > > df <- data.frame(ID=names(mylst), count=sapply(mylist, length)) > > df <- df[order(df$count, decreasing = TRUE),] > > > > You could also take things like gene symbols along for the ride by > starting with a data.frame: > > > > mybigdf <- data.frame(symbols = <concatenate symbols="" from="" all="" comps="">, > geneid = <concatenate gene="" ids="" from="" all="" comps="">) > > mylst <- split(mybigdf, mybigdf$geneid) > > df <- data.frame(ID = names(mylst), count = sapply(mylst, nrow), symbol > = sapply(mylst, function(x) x$symbol[1])) > > df <- df[order(df$count, decreasing = TRUE),] > > > > Best, > > > > Jim > > > > > > > > > > On Wed, Aug 27, 2014 at 6:48 AM, Stephane Plaisance | VIB | < > stephane.plaisance at vib.be> wrote: > > I have full genome/exome lists of DE resulting from MA and/or RNASeq > analyses using multiple methods (likely showing different gene even from > the same samples due to technology biases). I would like to rank these > lists to create a general list where redundant DE targets are pushed up and > unique hits ranked lower. > > > > What method/package should I start with? > > > > Thanks > > > > Stephane Plaisance > > stephane.plaisance at vib.be > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > -- > > James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Jose M. Garcia Manteiga PhD Data Analysis in Functional Genomics Center for Translational Genomics and BioInformatics Dibit2-Basilica, 4A3 San Raffaele Scientific Institute Via Olgettina 58, 20132 Milano (MI), Italy Tel: +39-02-2643-9144 e-mail: garciamanteiga.josemanuel at hsr.it [[alternative HTML version deleted]]
Thanks a lot Jose, I add RankProd to the top of my todo list! ;-) Stephane Plaisance stephane.plaisance at vib.be On 28 Aug 2014, at 10:11, Jose Garcia <garciamanteiga.josemanuel at="" hsr.it=""> wrote: > Dear Stephane, > If I understood well what you need, you could use RankProd package that uses rank product non parametric approach to give a p-value to the genes across different studies based on the ranking they achieve by log2FC. It permits to make such "meta-analyses" comparing lists of genes produced in different analysis. > Jose > > > 2014-08-28 9:03 GMT+02:00 Stephane Plaisance | VIB | <stephane.plaisance at="" vib.be="">: > Dear Jim, > > Thanks very much for this straightforward approach. I will certainly try it. My aim is to also take into account the pvalues and if applicable also the related log-FC values attached to each gene so that more than just ranking is used. I know of biotools (endeavour) that ranks lists of apples and peers and use specific methods but have no idea where exact to start. > > Thanks anyway for the help and code. > > So far I have found in the Bioc pages: > matchbox > Orderedlist > geneselector > rankrank > > I have tried none so if anybody has preferences, I am all ears. > > Cheers > Stephane Plaisance > stephane.plaisance at vib.be > > > > > > On 27 Aug 2014, at 16:29, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > > > Hi Stephane, > > > > If I understand you correctly, you have already made comparisons and now simply want to rank genes based on the number of comparisons in which they were found significant. I don't know of a particular package for doing this, and it would be really easy to do using functions in base R. All you would need to do (assuming you have some consistent identifier like Entrez Gene IDs for each comparison), would be to concatenate all the IDs into a single vector, and then count occurences: > > > > mybigvec <- c(<all the="" de="" gene="" ids="" go="" here="">) > > mylst <- split(mybigvec, mybigvec) > > df <- data.frame(ID=names(mylst), count=sapply(mylist, length)) > > df <- df[order(df$count, decreasing = TRUE),] > > > > You could also take things like gene symbols along for the ride by starting with a data.frame: > > > > mybigdf <- data.frame(symbols = <concatenate symbols="" from="" all="" comps="">, geneid = <concatenate gene="" ids="" from="" all="" comps="">) > > mylst <- split(mybigdf, mybigdf$geneid) > > df <- data.frame(ID = names(mylst), count = sapply(mylst, nrow), symbol = sapply(mylst, function(x) x$symbol[1])) > > df <- df[order(df$count, decreasing = TRUE),] > > > > Best, > > > > Jim > > > > > > > > > > On Wed, Aug 27, 2014 at 6:48 AM, Stephane Plaisance | VIB | <stephane.plaisance at="" vib.be=""> wrote: > > I have full genome/exome lists of DE resulting from MA and/or RNASeq analyses using multiple methods (likely showing different gene even from the same samples due to technology biases). I would like to rank these lists to create a general list where redundant DE targets are pushed up and unique hits ranked lower. > > > > What method/package should I start with? > > > > Thanks > > > > Stephane Plaisance > > stephane.plaisance at vib.be > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > -- > > James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Jose M. Garcia Manteiga PhD > Data Analysis in Functional Genomics > Center for Translational Genomics and BioInformatics > Dibit2-Basilica, 4A3 > San Raffaele Scientific Institute > Via Olgettina 58, 20132 Milano (MI), Italy > > Tel: +39-02-2643-9144 > e-mail: garciamanteiga.josemanuel at hsr.it [[alternative HTML version deleted]]
Hi Stephane, If you want to be more systematic about the comparisons, you might also consider the GeneMeta package. Best, Jim On Thu, Aug 28, 2014 at 3:03 AM, Stephane Plaisance | VIB | < stephane.plaisance at vib.be> wrote: > Dear Jim, > > Thanks very much for this straightforward approach. I will certainly try > it. My aim is to also take into account the pvalues and if applicable also > the related log-FC values attached to each gene so that more than just > ranking is used. I know of biotools (endeavour) that ranks lists of apples > and peers and use specific methods but have no idea where exact to start. > > Thanks anyway for the help and code. > > So far I have found in the Bioc pages: > matchbox > Orderedlist > geneselector > rankrank > > I have tried none so if anybody has preferences, I am all ears. > > Cheers > Stephane Plaisance > stephane.plaisance at vib.be > > > > > > On 27 Aug 2014, at 16:29, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > > Hi Stephane, > > If I understand you correctly, you have already made comparisons and now > simply want to rank genes based on the number of comparisons in which they > were found significant. I don't know of a particular package for doing > this, and it would be really easy to do using functions in base R. All you > would need to do (assuming you have some consistent identifier like Entrez > Gene IDs for each comparison), would be to concatenate all the IDs into a > single vector, and then count occurences: > > mybigvec <- c(<all the="" de="" gene="" ids="" go="" here="">) > mylst <- split(mybigvec, mybigvec) > df <- data.frame(ID=names(mylst), count=sapply(mylist, length)) > df <- df[order(df$count, decreasing = TRUE),] > > You could also take things like gene symbols along for the ride by > starting with a data.frame: > > mybigdf <- data.frame(symbols = <concatenate symbols="" from="" all="" comps="">, > geneid = <concatenate gene="" ids="" from="" all="" comps="">) > mylst <- split(mybigdf, mybigdf$geneid) > df <- data.frame(ID = names(mylst), count = sapply(mylst, nrow), symbol = > sapply(mylst, function(x) x$symbol[1])) > df <- df[order(df$count, decreasing = TRUE),] > > Best, > > Jim > > > > > On Wed, Aug 27, 2014 at 6:48 AM, Stephane Plaisance | VIB | < > stephane.plaisance at vib.be> wrote: > >> I have full genome/exome lists of DE resulting from MA and/or RNASeq >> analyses using multiple methods (likely showing different gene even from >> the same samples due to technology biases). I would like to rank these >> lists to create a general list where redundant DE targets are pushed up and >> unique hits ranked lower. >> >> What method/package should I start with? >> >> Thanks >> >> Stephane Plaisance >> stephane.plaisance at vib.be >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]