BSgenome or org.Hs.eg.db to find gene length

0

Entering edit mode

Fatemehsadat Seyednasrollah ▴ 260

@fatemehsadat-seyednasrollah-5367

Last seen 9.7 years ago

Dear list, As I have read I can find chromosome number (using org.Hs.egCHR) , chromosome location (org.Hs.egCHRLOC) and end position(using org.Hs.egCHRLOCEND) of a list of gene symbols. But I did not find which one mapped the gene length to its symbol. Should I subtract what I get in org.Hs.egCHRLOCEND from org.Hs.egCHRLOC for each gene symbol to find the gene length or is there an easier way to find it for a long list of gene symbols. Thank you

• 2.5k views

ADD COMMENT • link updated 11.6 years ago by Marc Carlson ★ 7.2k • written 11.6 years ago by Fatemehsadat Seyednasrollah ▴ 260

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 7.8 years ago

United States

Hi Fatemehsadat, You could consider doing it this way: library(Homo.sapiens) cols(Homo.sapiens) ## shows cols you could use keytypes(Homo.sapiens) ## shows keytypes k <- keys(Homo.sapiens,keytype="ENTREZID") ## discovers all available keys of this kind result <- select(Homo.sapiens, k, cols=c("TXNAME","TXSTART","TXEND", "TXSTRAND"), keytype="ENTREZID") Then you could process that result according to your definition of what you think constitutes the "gene range". Do you think it is the max range? The average? Maybe the max range plus some buffering sequence to account for likely transcriptional regulators? It's your call how you want to do that step, but the data frame in result should give you the range positions for all the transcripts and their associated gene IDs. OR, you might also consider doing it this way: result2 <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by= "gene") Which will give you a list like object that is also suitable for use in range operations. Hope this helps, Marc On 10/11/2012 09:42 AM, Fatemehsadat Seyednasrollah wrote: > Dear list, > > As I have read I can find chromosome number (using org.Hs.egCHR) , chromosome location (org.Hs.egCHRLOC) and end position(using org.Hs.egCHRLOCEND) of a list of gene symbols. But I did not find which one mapped the gene length to its symbol. Should I subtract what I get in org.Hs.egCHRLOCEND from org.Hs.egCHRLOC for each gene symbol to find the gene length or is there an easier way to find it for a long list of gene symbols. > > Thank you > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.6 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Howdy, Out of curiosity: On Thu, Oct 11, 2012 at 1:10 PM, Marc Carlson <mcarlson at="" fhcrc.org=""> wrote: > Hi Fatemehsadat, > > You could consider doing it this way: > > library(Homo.sapiens) Is this is a new class of packages (along w/ Mus.musculus, etc.) in bioc 2.11? What's the relation to the org.Hs.eg.db package? Is there some documentation on this class of packages? Sorry if I missed something obvious -- perhaps I missed the memo? Thanks, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 11.6 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

OrganismDbi -- too many of us are used to doing things the confusing way -- using OrganismDbi packages like Homo.sapiens will be better long-term The OrganismDbi packages automate the linkage between all the bits and pieces of annotation for a given organism, including org.foo.db, TxDb.foo, etc. On Thu, Oct 11, 2012 at 10:48 AM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Howdy, > > Out of curiosity: > > On Thu, Oct 11, 2012 at 1:10 PM, Marc Carlson <mcarlson@fhcrc.org> wrote: > > Hi Fatemehsadat, > > > > You could consider doing it this way: > > > > library(Homo.sapiens) > > Is this is a new class of packages (along w/ Mus.musculus, etc.) in bioc > 2.11? > > What's the relation to the org.Hs.eg.db package? Is there some > documentation on this class of packages? > > Sorry if I missed something obvious -- perhaps I missed the memo? > > Thanks, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 11.6 years ago Tim Triche ★ 4.2k

0

Entering edit mode

On Thu, Oct 11, 2012 at 2:54 PM, Tim Triche, Jr. <tim.triche at="" gmail.com=""> wrote: > OrganismDbi -- too many of us are used to doing things the confusing way -- > using OrganismDbi packages like Homo.sapiens will be better long- term Cool ... I like being less confused. Thanks for the pointer, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 11.6 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Yes, Sorry about the lack of memos. ;) OrganismDbi is a new package that allows you to make meta packages from annotation packages that implement a select() method. Homo.sapiens is one we made for humans. It combines the human org package, the hg19 txdb known gene package and the GO.db package. The package does not actually "contain" all of that data though. It just retrieves it as requested and returns it to users as if there was a single place it was all coming from. Marc On 10/11/2012 12:33 PM, Steve Lianoglou wrote: > On Thu, Oct 11, 2012 at 2:54 PM, Tim Triche, Jr.<tim.triche at="" gmail.com=""> wrote: >> OrganismDbi -- too many of us are used to doing things the confusing way -- >> using OrganismDbi packages like Homo.sapiens will be better long- term > Cool ... I like being less confused. > > Thanks for the pointer, > -steve >

ADD REPLY • link 11.6 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

It's definitely a step in the right direction. A small next step would be supporting queries based on gene symbols, as the OP had asked about. Sure, one could do a transcriptsBy() on the TxDb package and subset, but that means it has to be by="gene", and it's slower. Also, has there been any progress towards supporting transcriptsBy on the OrganismDbi package? Michael On Thu, Oct 11, 2012 at 1:46 PM, Marc Carlson <mcarlson@fhcrc.org> wrote: > Yes, > > Sorry about the lack of memos. ;) OrganismDbi is a new package that > allows you to make meta packages from annotation packages that implement a > select() method. Homo.sapiens is one we made for humans. It combines the > human org package, the hg19 txdb known gene package and the GO.db package. > The package does not actually "contain" all of that data though. It just > retrieves it as requested and returns it to users as if there was a single > place it was all coming from. > > Marc > > > > > On 10/11/2012 12:33 PM, Steve Lianoglou wrote: > >> On Thu, Oct 11, 2012 at 2:54 PM, Tim Triche, Jr.<tim.triche@gmail.com> >> wrote: >> >>> OrganismDbi -- too many of us are used to doing things the confusing way >>> -- >>> using OrganismDbi packages like Homo.sapiens will be better long- term >>> >> Cool ... I like being less confused. >> >> Thanks for the pointer, >> -steve >> >> > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]

ADD REPLY • link 11.6 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Oh sorry I missed that little detail about using gene symbols. Here is how you would do it when you need to query by gene symbol: library(Homo.sapiens) cols(Homo.sapiens) ## shows cols you could use keytypes(Homo.sapiens) ## shows keytypes k <- keys(Homo.sapiens,keytype="SYMBOL") ## discovers all available keys of this kind result <- select(Homo.sapiens, k, cols=c("TXNAME","TXSTART","TXEND", "TXSTRAND"), keytype="SYMBOL") The plan to support transcriptsBy etc for OrganismDbi is still just a plan. But we don't intend for it to remain a "plan" forever. Marc On 10/11/2012 01:58 PM, Michael Lawrence wrote: > It's definitely a step in the right direction. A small next step would > be supporting queries based on gene symbols, as the OP had asked > about. Sure, one could do a transcriptsBy() on the TxDb package and > subset, but that means it has to be by="gene", and it's slower. Also, > has there been any progress towards supporting transcriptsBy on the > OrganismDbi package? > > Michael > > On Thu, Oct 11, 2012 at 1:46 PM, Marc Carlson <mcarlson@fhcrc.org> <mailto:mcarlson@fhcrc.org>> wrote: > > Yes, > > Sorry about the lack of memos. ;) OrganismDbi is a new package > that allows you to make meta packages from annotation packages > that implement a select() method. Homo.sapiens is one we made for > humans. It combines the human org package, the hg19 txdb known > gene package and the GO.db package. The package does not actually > "contain" all of that data though. It just retrieves it as > requested and returns it to users as if there was a single place > it was all coming from. > > Marc > > > > > On 10/11/2012 12:33 PM, Steve Lianoglou wrote: > > On Thu, Oct 11, 2012 at 2:54 PM, Tim Triche, > Jr.<tim.triche@gmail.com <mailto:tim.triche@gmail.com="">> wrote: > > OrganismDbi -- too many of us are used to doing things the > confusing way -- > using OrganismDbi packages like Homo.sapiens will be > better long-term > > Cool ... I like being less confused. > > Thanks for the pointer, > -steve > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]

ADD REPLY • link 11.6 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 7.8 years ago

United States

Hi Fatemehsadat, Lets keep this on the list. We almost always want to keep the thread public so that others can benefit from our conversations. And also, I am not really sure how to answer your question (it's not a simple question), and others may have suggestions. You can't get their input if you only speak with me. Really though, your question about how to choose really depends on context that you have not provided us here. What is it that you want to know? I mentioned some strategies in my earlier post. For some cases the longest transcript may be what you want, for others you may want the maximum range that a transcript can cover, for other cases, you may want to "buffer" that region by adding to it. For yet other cases you may not care about the range at all and may only want to call unique on the result. But I can't give even an opinion without knowing more about what you are trying to do. Marc On 10/12/2012 06:15 AM, Fatemehsadat Seyednasrollah wrote: > Hi, > Thank you so much. It was great using the package from the point of diversity of available features. Now I was wondering I can use the result of my query as an annotation file for other R packages as well. > Just I wanted to know your opinion about how to decide which isofrom should I decide to choose for my annotation file. > Imagine I need an annotation file with row names of gene symbols for example for the first symbol I have : > > SYMBOL TXSTART TXEND length > 1 A1BG 58858172 58864865 6693 > 2 A1BG 58859832 58874214 14382 > > and so many other duplicated gene symbols. How do you decide which isoform to choose for having a unique annotation file of gene symbols. > > Thank you again. > ________________________________________ > From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of Marc Carlson [mcarlson at fhcrc.org] > Sent: Friday, October 12, 2012 1:18 AM > To: Michael Lawrence > Cc: bioconductor at r-project.org > Subject: Re: [BioC] BSgenome or org.Hs.eg.db to find gene length > > Oh sorry I missed that little detail about using gene symbols. > > Here is how you would do it when you need to query by gene symbol: > > library(Homo.sapiens) > cols(Homo.sapiens) ## shows cols you could use > keytypes(Homo.sapiens) ## shows keytypes > k<- keys(Homo.sapiens,keytype="SYMBOL") ## discovers all available > keys of this kind > result<- select(Homo.sapiens, k, cols=c("TXNAME","TXSTART","TXEND", > "TXSTRAND"), keytype="SYMBOL") > > > The plan to support transcriptsBy etc for OrganismDbi is still just a > plan. But we don't intend for it to remain a "plan" forever. > > > Marc > > > > > > On 10/11/2012 01:58 PM, Michael Lawrence wrote: >> It's definitely a step in the right direction. A small next step would >> be supporting queries based on gene symbols, as the OP had asked >> about. Sure, one could do a transcriptsBy() on the TxDb package and >> subset, but that means it has to be by="gene", and it's slower. Also, >> has there been any progress towards supporting transcriptsBy on the >> OrganismDbi package? >> >> Michael >> >> On Thu, Oct 11, 2012 at 1:46 PM, Marc Carlson<mcarlson at="" fhcrc.org="">> <mailto:mcarlson at="" fhcrc.org="">> wrote: >> >> Yes, >> >> Sorry about the lack of memos. ;) OrganismDbi is a new package >> that allows you to make meta packages from annotation packages >> that implement a select() method. Homo.sapiens is one we made for >> humans. It combines the human org package, the hg19 txdb known >> gene package and the GO.db package. The package does not actually >> "contain" all of that data though. It just retrieves it as >> requested and returns it to users as if there was a single place >> it was all coming from. >> >> Marc >> >> >> >> >> On 10/11/2012 12:33 PM, Steve Lianoglou wrote: >> >> On Thu, Oct 11, 2012 at 2:54 PM, Tim Triche, >> Jr.<tim.triche at="" gmail.com<mailto:tim.triche="" at="" gmail.com="">> wrote: >> >> OrganismDbi -- too many of us are used to doing things the >> confusing way -- >> using OrganismDbi packages like Homo.sapiens will be >> better long-term >> >> Cool ... I like being less confused. >> >> Thanks for the pointer, >> -steve >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.6 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

once upon a time there was a SpliceGraph package that aimed to resolve some of these questions anyone know if Mr. Bindreither is doing OK? As of this AM it could not be built. But it might resolve deeper questions like Fatemehsadat's my $0.02 (adjusted for rampant inflation) --t On Fri, Oct 12, 2012 at 10:12 AM, Marc Carlson <mcarlson@fhcrc.org> wrote: > Hi Fatemehsadat, > > Lets keep this on the list. We almost always want to keep the thread > public so that others can benefit from our conversations. And also, I am > not really sure how to answer your question (it's not a simple question), > and others may have suggestions. You can't get their input if you only > speak with me. > > Really though, your question about how to choose really depends on context > that you have not provided us here. What is it that you want to know? I > mentioned some strategies in my earlier post. For some cases the longest > transcript may be what you want, for others you may want the maximum range > that a transcript can cover, for other cases, you may want to "buffer" that > region by adding to it. For yet other cases you may not care about the > range at all and may only want to call unique on the result. But I can't > give even an opinion without knowing more about what you are trying to do. > > > Marc > > > On 10/12/2012 06:15 AM, Fatemehsadat Seyednasrollah wrote: > >> Hi, >> Thank you so much. It was great using the package from the point of >> diversity of available features. Now I was wondering I can use the result >> of my query as an annotation file for other R packages as well. >> Just I wanted to know your opinion about how to decide which isofrom >> should I decide to choose for my annotation file. >> Imagine I need an annotation file with row names of gene symbols for >> example for the first symbol I have : >> >> SYMBOL TXSTART TXEND length >> 1 A1BG 58858172 58864865 6693 >> 2 A1BG 58859832 58874214 14382 >> >> and so many other duplicated gene symbols. How do you decide which >> isoform to choose for having a unique annotation file of gene symbols. >> >> Thank you again. >> ______________________________**__________ >> From: bioconductor-bounces@r-**project.org<bioconductor- bounces@r-project.org="">[ >> bioconductor-bounces@r-**project.org <bioconductor- bounces@r-project.org="">] >> on behalf of Marc Carlson [mcarlson@fhcrc.org] >> Sent: Friday, October 12, 2012 1:18 AM >> To: Michael Lawrence >> Cc: bioconductor@r-project.org >> Subject: Re: [BioC] BSgenome or org.Hs.eg.db to find gene length >> >> >> Oh sorry I missed that little detail about using gene symbols. >> >> Here is how you would do it when you need to query by gene symbol: >> >> library(Homo.sapiens) >> cols(Homo.sapiens) ## shows cols you could use >> keytypes(Homo.sapiens) ## shows keytypes >> k<- keys(Homo.sapiens,keytype="**SYMBOL") ## discovers all available >> keys of this kind >> result<- select(Homo.sapiens, k, cols=c("TXNAME","TXSTART","**TXEND", >> "TXSTRAND"), keytype="SYMBOL") >> >> >> The plan to support transcriptsBy etc for OrganismDbi is still just a >> plan. But we don't intend for it to remain a "plan" forever. >> >> >> Marc >> >> >> >> >> >> On 10/11/2012 01:58 PM, Michael Lawrence wrote: >> >>> It's definitely a step in the right direction. A small next step would >>> be supporting queries based on gene symbols, as the OP had asked >>> about. Sure, one could do a transcriptsBy() on the TxDb package and >>> subset, but that means it has to be by="gene", and it's slower. Also, >>> has there been any progress towards supporting transcriptsBy on the >>> OrganismDbi package? >>> >>> Michael >>> >>> On Thu, Oct 11, 2012 at 1:46 PM, Marc Carlson<mcarlson@fhcrc.org>>> <mailto:mcarlson@fhcrc.org>> wrote: >>> >>> Yes, >>> >>> Sorry about the lack of memos. ;) OrganismDbi is a new package >>> that allows you to make meta packages from annotation packages >>> that implement a select() method. Homo.sapiens is one we made for >>> humans. It combines the human org package, the hg19 txdb known >>> gene package and the GO.db package. The package does not actually >>> "contain" all of that data though. It just retrieves it as >>> requested and returns it to users as if there was a single place >>> it was all coming from. >>> >>> Marc >>> >>> >>> >>> >>> On 10/11/2012 12:33 PM, Steve Lianoglou wrote: >>> >>> On Thu, Oct 11, 2012 at 2:54 PM, Tim Triche, >>> Jr.<tim.triche@gmail.com<**mailto:tim.triche@gmail.com>> >>> wrote: >>> >>> OrganismDbi -- too many of us are used to doing things the >>> confusing way -- >>> using OrganismDbi packages like Homo.sapiens will be >>> better long-term >>> >>> Cool ... I like being less confused. >>> >>> Thanks for the pointer, >>> -steve >>> >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org<**mailto:Bioconductor@r-project.** org<bioconductor@r-project.org> >>> > >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.** >>> conductor<http: news.gmane.org="" gmane.science.biology.informatics.="" conductor=""> >>> >>> >>> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 11.6 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Hi, First sorry that I did not mention my ultimate intend. I am doing some research to find the effect of data filtering on the number of differentially expressed genes. For this purpose I apply different filtering using R package. Now I wanted to create an annotation file which keeps some features of genes of my RNA seq dataset to use when it is necessary to have gene annotation to find differentially expressed genes. For example I saw that if I want to use NOISeq to find the DE genes I need to have the genes annotation as well. With many thanks and best regards, Fatemeh ________________________________ From: Tim Triche, Jr. [tim.triche@gmail.com] Sent: Friday, October 12, 2012 8:46 PM To: Marc Carlson Cc: Fatemehsadat Seyednasrollah; bioconductor@r-project.org Subject: Re: [BioC] BSgenome or org.Hs.eg.db to find gene length once upon a time there was a SpliceGraph package that aimed to resolve some of these questions anyone know if Mr. Bindreither is doing OK? As of this AM it could not be built. But it might resolve deeper questions like Fatemehsadat's my $0.02 (adjusted for rampant inflation) --t On Fri, Oct 12, 2012 at 10:12 AM, Marc Carlson <mcarlson@fhcrc.org<mailto:mcarlson@fhcrc.org>> wrote: Hi Fatemehsadat, Lets keep this on the list. We almost always want to keep the thread public so that others can benefit from our conversations. And also, I am not really sure how to answer your question (it's not a simple question), and others may have suggestions. You can't get their input if you only speak with me. Really though, your question about how to choose really depends on context that you have not provided us here. What is it that you want to know? I mentioned some strategies in my earlier post. For some cases the longest transcript may be what you want, for others you may want the maximum range that a transcript can cover, for other cases, you may want to "buffer" that region by adding to it. For yet other cases you may not care about the range at all and may only want to call unique on the result. But I can't give even an opinion without knowing more about what you are trying to do. Marc On 10/12/2012 06:15 AM, Fatemehsadat Seyednasrollah wrote: Hi, Thank you so much. It was great using the package from the point of diversity of available features. Now I was wondering I can use the result of my query as an annotation file for other R packages as well. Just I wanted to know your opinion about how to decide which isofrom should I decide to choose for my annotation file. Imagine I need an annotation file with row names of gene symbols for example for the first symbol I have : SYMBOL TXSTART TXEND length 1 A1BG 58858172 58864865 6693 2 A1BG 58859832 58874214 14382 and so many other duplicated gene symbols. How do you decide which isoform to choose for having a unique annotation file of gene symbols. Thank you again. ________________________________________ From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""> [bioconductor-bounces@r-project.org<mailto :bioconductor-bounces@r-project.org="">] on behalf of Marc Carlson [mcarlson@fhcrc.org<mailto:mcarlson@fhcrc.org>] Sent: Friday, October 12, 2012 1:18 AM To: Michael Lawrence Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: Re: [BioC] BSgenome or org.Hs.eg.db to find gene length Oh sorry I missed that little detail about using gene symbols. Here is how you would do it when you need to query by gene symbol: library(Homo.sapiens) cols(Homo.sapiens) ## shows cols you could use keytypes(Homo.sapiens) ## shows keytypes k<- keys(Homo.sapiens,keytype="SYMBOL") ## discovers all available keys of this kind result<- select(Homo.sapiens, k, cols=c("TXNAME","TXSTART","TXEND", "TXSTRAND"), keytype="SYMBOL") The plan to support transcriptsBy etc for OrganismDbi is still just a plan. But we don't intend for it to remain a "plan" forever. Marc On 10/11/2012 01:58 PM, Michael Lawrence wrote: It's definitely a step in the right direction. A small next step would be supporting queries based on gene symbols, as the OP had asked about. Sure, one could do a transcriptsBy() on the TxDb package and subset, but that means it has to be by="gene", and it's slower. Also, has there been any progress towards supporting transcriptsBy on the OrganismDbi package? Michael On Thu, Oct 11, 2012 at 1:46 PM, Marc Carlson<mcarlson@fhcrc.org<mailto:mcarlson@fhcrc.org> <mailto:mcarlson@fhcrc.org<mailto:mcarlson@fhcrc.org>>> wrote: Yes, Sorry about the lack of memos. ;) OrganismDbi is a new package that allows you to make meta packages from annotation packages that implement a select() method. Homo.sapiens is one we made for humans. It combines the human org package, the hg19 txdb known gene package and the GO.db package. The package does not actually "contain" all of that data though. It just retrieves it as requested and returns it to users as if there was a single place it was all coming from. Marc On 10/11/2012 12:33 PM, Steve Lianoglou wrote: On Thu, Oct 11, 2012 at 2:54 PM, Tim Triche, Jr.<tim.triche@gmail.com<mailto:tim.triche@gmail.com><mailto: tim.triche@gmail.com<mailto:tim.triche@gmail.com="">>> wrote: OrganismDbi -- too many of us are used to doing things the confusing way -- using OrganismDbi packages like Homo.sapiens will be better long-term Cool ... I like being less confused. Thanks for the pointer, -steve _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org><mai lto:bioconductor@r-project.org<mailto:bioconductor@r-project.org="">> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- A model is a lie that helps you see the truth. Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 11.6 years ago Fatemehsadat Seyednasrollah ▴ 260

0

Entering edit mode

right so your question actually has sub-questions to it: 1) what's a gene? 2) does the performance of these packages differ depending on how you answer (1) and if so how? For example, DEXSeq is answering a different superficial question than edgeR, but you could in principle use edgeR to answer the same questions. What happens if you do the book-keeping yourself and play with different dispersion estimators (shrunken or not) at the gene level? at the exon level? isoform level? and how do you decide which one is the appropriate level for your analysis? And how do you decide what is the best way to present the results? Every day there are more shrinkage estimators for biological and technical/shot-noise dispersion estimates (a mean shift is a useless estimate for testing unless you have a good estimate of the dispersion within groups -- think "point intensity" if there is any doubt of this) and every day the reads get longer. The question you're asking is not a simple one and a reasoned answer will benefit an awful lot of people :-) good luck, --t On Sat, Oct 13, 2012 at 12:25 AM, Fatemehsadat Seyednasrollah <fatsey@utu.fi> wrote: > Hi, > > First sorry that I did not mention my ultimate intend. I am doing some > research to find the effect of data filtering on the number of > differentially expressed genes. For this purpose I apply different > filtering using R package. Now I wanted to create an annotation file which > keeps some features of genes of my RNA seq dataset to use when it is > necessary to have gene annotation to find differentially expressed genes. > For example I saw that if I want to use NOISeq to find the DE genes I need > to have the genes annotation as well. > > With many thanks and best regards, > Fatemeh > ------------------------------ > *From:* Tim Triche, Jr. [tim.triche@gmail.com] > *Sent:* Friday, October 12, 2012 8:46 PM > *To:* Marc Carlson > *Cc:* Fatemehsadat Seyednasrollah; bioconductor@r-project.org > > *Subject:* Re: [BioC] BSgenome or org.Hs.eg.db to find gene length > > once upon a time there was a SpliceGraph package that aimed to resolve > some of these questions > > anyone know if Mr. Bindreither is doing OK? As of this AM it could not > be built. But it might resolve deeper questions like Fatemehsadat's > > my $0.02 (adjusted for rampant inflation) > > --t > > > On Fri, Oct 12, 2012 at 10:12 AM, Marc Carlson <mcarlson@fhcrc.org> wrote: > >> Hi Fatemehsadat, >> >> Lets keep this on the list. We almost always want to keep the thread >> public so that others can benefit from our conversations. And also, I am >> not really sure how to answer your question (it's not a simple question), >> and others may have suggestions. You can't get their input if you only >> speak with me. >> >> Really though, your question about how to choose really depends on >> context that you have not provided us here. What is it that you want to >> know? I mentioned some strategies in my earlier post. For some cases the >> longest transcript may be what you want, for others you may want the >> maximum range that a transcript can cover, for other cases, you may want to >> "buffer" that region by adding to it. For yet other cases you may not care >> about the range at all and may only want to call unique on the result. But >> I can't give even an opinion without knowing more about what you are trying >> to do. >> >> >> Marc >> >> >> On 10/12/2012 06:15 AM, Fatemehsadat Seyednasrollah wrote: >> >>> Hi, >>> Thank you so much. It was great using the package from the point of >>> diversity of available features. Now I was wondering I can use the result >>> of my query as an annotation file for other R packages as well. >>> Just I wanted to know your opinion about how to decide which isofrom >>> should I decide to choose for my annotation file. >>> Imagine I need an annotation file with row names of gene symbols for >>> example for the first symbol I have : >>> >>> SYMBOL TXSTART TXEND length >>> 1 A1BG 58858172 58864865 6693 >>> 2 A1BG 58859832 58874214 14382 >>> >>> and so many other duplicated gene symbols. How do you decide which >>> isoform to choose for having a unique annotation file of gene symbols. >>> >>> Thank you again. >>> ______________________________**__________ >>> From: bioconductor-bounces@r-**project.org<bioconductor- bounces@r-project.org="">[ >>> bioconductor-bounces@r-**project.org<bioconductor- bounces@r-project.org="">] >>> on behalf of Marc Carlson [mcarlson@fhcrc.org] >>> Sent: Friday, October 12, 2012 1:18 AM >>> To: Michael Lawrence >>> Cc: bioconductor@r-project.org >>> Subject: Re: [BioC] BSgenome or org.Hs.eg.db to find gene length >>> >>> >>> Oh sorry I missed that little detail about using gene symbols. >>> >>> Here is how you would do it when you need to query by gene symbol: >>> >>> library(Homo.sapiens) >>> cols(Homo.sapiens) ## shows cols you could use >>> keytypes(Homo.sapiens) ## shows keytypes >>> k<- keys(Homo.sapiens,keytype="**SYMBOL") ## discovers all available >>> keys of this kind >>> result<- select(Homo.sapiens, k, cols=c("TXNAME","TXSTART","**TXEND", >>> "TXSTRAND"), keytype="SYMBOL") >>> >>> >>> The plan to support transcriptsBy etc for OrganismDbi is still just a >>> plan. But we don't intend for it to remain a "plan" forever. >>> >>> >>> Marc >>> >>> >>> >>> >>> >>> On 10/11/2012 01:58 PM, Michael Lawrence wrote: >>> >>>> It's definitely a step in the right direction. A small next step would >>>> be supporting queries based on gene symbols, as the OP had asked >>>> about. Sure, one could do a transcriptsBy() on the TxDb package and >>>> subset, but that means it has to be by="gene", and it's slower. Also, >>>> has there been any progress towards supporting transcriptsBy on the >>>> OrganismDbi package? >>>> >>>> Michael >>>> >>>> On Thu, Oct 11, 2012 at 1:46 PM, Marc Carlson<mcarlson@fhcrc.org>>>> <mailto:mcarlson@fhcrc.org>> wrote: >>>> >>>> Yes, >>>> >>>> Sorry about the lack of memos. ;) OrganismDbi is a new package >>>> that allows you to make meta packages from annotation packages >>>> that implement a select() method. Homo.sapiens is one we made for >>>> humans. It combines the human org package, the hg19 txdb known >>>> gene package and the GO.db package. The package does not actually >>>> "contain" all of that data though. It just retrieves it as >>>> requested and returns it to users as if there was a single place >>>> it was all coming from. >>>> >>>> Marc >>>> >>>> >>>> >>>> >>>> On 10/11/2012 12:33 PM, Steve Lianoglou wrote: >>>> >>>> On Thu, Oct 11, 2012 at 2:54 PM, Tim Triche, >>>> Jr.<tim.triche@gmail.com<**mailto:tim.triche@gmail.com>> >>>> wrote: >>>> >>>> OrganismDbi -- too many of us are used to doing things the >>>> confusing way -- >>>> using OrganismDbi packages like Homo.sapiens will be >>>> better long-term >>>> >>>> Cool ... I like being less confused. >>>> >>>> Thanks for the pointer, >>>> -steve >>>> >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org<**mailto:Bioconductor@r-project.* *org<bioconductor@r-project.org> >>>> > >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.**science.biology.informatics.** >>>> conductor<http: news.gmane.org="" gmane.science.biology.informatics="" .conductor=""> >>>> >>>> >>>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 11.6 years ago Tim Triche ★ 4.2k

Login before adding your answer.