Getting the start and end positions of a list of genes
1
0
Entering edit mode
Guest User ★ 12k
@guest-user-4897
Last seen 7.1 years ago
Dear listserv, I am a long-time R user, novice Bioconductor user. I am quickly realizing they are not the same thing. I have a very basic question that I hope you can help me with. I have a list of genes in Arabidopsis thaliana. I want to input this list into R/Bioconductor and output a table listing the start and end positions of each gene. Specific code that will get the job done will be the most helpful for me. Also, please tell me the specific packages and databases and such I must load into memory. I am a total newbie at this. Thanks in advance, ----------------------------------- Josh Banta, Ph.D Assistant Professor Department of Biology The University of Texas at Tyler Tyler, TX 75799 Tel: (903) 565-5655 http://plantevolutionaryecology.org -- output of sessionInfo(): > gene.pos <- data.frame(matrix(nrow = 3, ncol = 4)) > gene.list <- c("At5g35790", "AT5g60910", "AT1g16560") > gene.pos[,1] <- gene.list > colnames(gene.pos) <- c("gene", "chromosome", "nuc_sequence_start" , "nuc_sequence_end") > > gene.pos gene chromosome nuc_sequence_start nuc_sequence_end 1 At5g35790 NA NA NA 2 AT5g60910 NA NA NA 3 AT1g16560 NA NA NA > > #now what? How do I fill in the blanks? -- Sent via the guest posting facility at bioconductor.org.
Arabidopsis thaliana Arabidopsis thaliana • 1.8k views
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 4 days ago
United States
good spec, but i can't get through the whole thing just now. this could get you started source("http://bioconductor.org/biocLite.R") biocLite("TxDb.Athaliana.BioMart.plantsmart12") library(TxDb.Athaliana.BioMart.plantsmart12) txdb = TxDb.Athaliana.BioMart.plantsmart12 tr = transcriptsBy(txdb, by="gene") > tr GRangesList of length 33602: $AT1G01010 GRanges with 1 range and 2 elementMetadata cols: seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] 1 [3631, 5899] + | 9694 AT1G01010.1$AT1G01020 GRanges with 2 ranges and 2 elementMetadata cols: seqnames ranges strand | tx_id tx_name [1] 1 [5928, 8737] - | 29355 AT1G01020.1 [2] 1 [6790, 8737] - | 29354 AT1G01020.2 $AT1G01030 GRanges with 1 range and 2 elementMetadata cols: seqnames ranges strand | tx_id tx_name [1] 1 [11649, 13714] - | 26358 AT1G01030.1 ... <33599 more elements> --- seqlengths: 3 4 1 5 2 Pt Mt NA NA NA NA NA NA NA you could use an org.At* package a bit more simply, use the CHRLOC and CHRLOCEND elements. please look at the metadata page of bioconductor.org INSTALL node for your organism. this should be a standard use case or faq, perhaps On Sun, Jun 17, 2012 at 6:33 PM, Josh [guest] <guest@bioconductor.org>wrote: > > Dear listserv, > > I am a long-time R user, novice Bioconductor user. I am quickly realizing > they are not the same thing. I have a very basic question that I hope you > can help me with. > > I have a list of genes in Arabidopsis thaliana. I want to input this list > into R/Bioconductor and output a table listing the start and end positions > of each gene. > > Specific code that will get the job done will be the most helpful for me. > Also, please tell me the specific packages and databases and such I must > load into memory. I am a total newbie at this. > > Thanks in advance, > ----------------------------------- > Josh Banta, Ph.D > Assistant Professor > Department of Biology > The University of Texas at Tyler > Tyler, TX 75799 > Tel: (903) 565-5655 > http://plantevolutionaryecology.org > > -- output of sessionInfo(): > > > gene.pos <- data.frame(matrix(nrow = 3, ncol = 4)) > > gene.list <- c("At5g35790", "AT5g60910", "AT1g16560") > > gene.pos[,1] <- gene.list > > colnames(gene.pos) <- c("gene", "chromosome", "nuc_sequence_start" , > "nuc_sequence_end") > > > > gene.pos > gene chromosome nuc_sequence_start nuc_sequence_end > 1 At5g35790 NA NA NA > 2 AT5g60910 NA NA NA > 3 AT1g16560 NA NA NA > > > > #now what? How do I fill in the blanks? > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] 0 Entering edit mode Hi, I'll get you a step further: On 6/17/12 5:57 PM, "Vincent Carey" <stvjc at="" channing.harvard.edu=""> wrote: >good spec, but i can't get through the whole thing just now. this could >get you started > >source("http://bioconductor.org/biocLite.R") > biocLite("TxDb.Athaliana.BioMart.plantsmart12") >library(TxDb.Athaliana.BioMart.plantsmart12) >txdb = TxDb.Athaliana.BioMart.plantsmart12 >tr = transcriptsBy(txdb, by="gene") # assuming that for each gene's coordinate, you want the extreme starts and ends of its (potentially multiple) transcripts: gene.gr <- reduce(tr) # ISA GenomicRange gene.df<-asgene.gr,'data.frame') # whose names are the gene identifiers Now its a matter of coercing column names, and selecting from the BioMart data just the rows for your identifiers (and checking they are all there, and complaining if not). Cheers, Malcolm Cook > >> tr >GRangesList of length 33602: >$AT1G01010 >GRanges with 1 range and 2 elementMetadata cols: > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > [1] 1 [3631, 5899] + | 9694 AT1G01010.1 > >$AT1G01020 >GRanges with 2 ranges and 2 elementMetadata cols: > seqnames ranges strand | tx_id tx_name > [1] 1 [5928, 8737] - | 29355 AT1G01020.1 > [2] 1 [6790, 8737] - | 29354 AT1G01020.2 > >$AT1G01030 >GRanges with 1 range and 2 elementMetadata cols: > seqnames ranges strand | tx_id tx_name > [1] 1 [11649, 13714] - | 26358 AT1G01030.1 > >... ><33599 more elements> >--- >seqlengths: > 3 4 1 5 2 Pt Mt > NA NA NA NA NA NA NA > >you could use an org.At* package a bit more simply, use the CHRLOC and >CHRLOCEND >elements. please look at the metadata page of bioconductor.org >INSTALL node for your >organism. this should be a standard use case or faq, perhaps > > > >On Sun, Jun 17, 2012 at 6:33 PM, Josh [guest] ><guest at="" bioconductor.org="">wrote: > >> >> Dear listserv, >> >> I am a long-time R user, novice Bioconductor user. I am quickly >>realizing >> they are not the same thing. I have a very basic question that I hope >>you >> can help me with. >> >> I have a list of genes in Arabidopsis thaliana. I want to input this >>list >> into R/Bioconductor and output a table listing the start and end >>positions >> of each gene. >> >> Specific code that will get the job done will be the most helpful for >>me. >> Also, please tell me the specific packages and databases and such I must >> load into memory. I am a total newbie at this. >> >> Thanks in advance, >> ----------------------------------- >> Josh Banta, Ph.D >> Assistant Professor >> Department of Biology >> The University of Texas at Tyler >> Tyler, TX 75799 >> Tel: (903) 565-5655 >> http://plantevolutionaryecology.org >> >> -- output of sessionInfo(): >> >> > gene.pos <- data.frame(matrix(nrow = 3, ncol = 4)) >> > gene.list <- c("At5g35790", "AT5g60910", "AT1g16560") >> > gene.pos[,1] <- gene.list >> > colnames(gene.pos) <- c("gene", "chromosome", "nuc_sequence_start" , >> "nuc_sequence_end") >> > >> > gene.pos >> gene chromosome nuc_sequence_start nuc_sequence_end >> 1 At5g35790 NA NA NA >> 2 AT5g60910 NA NA NA >> 3 AT1g16560 NA NA NA >> > >> > #now what? How do I fill in the blanks? >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor