How to set up genes argument for makeTranscriptDb( ) function in GenomicFeatures?

0

Entering edit mode

Ying Chen ▴ 110

@ying-chen-4756

Last seen 9.6 years ago

Hi guys, I am new to the GenomicsFeatures package and I am trying to make my own txdb out of TCGA GAF file using the function makeTranscriptDb(transcripts, splicings,genes) in GenomicsFeatures. The manual says that genes is a data frame containing the genes associated to a set of transcripts. I just wonder what format should it be. Will the following work? genes <- data.frame( tx_id=c(txid1,txid2, txid3, txid4, txid5, txid6, txid7, txid8, txid9,............), # first 9 tx_ids for 3 gene_ids are listed gene_id=c(geneid1, geneid1, geneid2, geneid2, geneid2, geneid3, geneid3, geneid3, geneid3,.......)) # first 3 gene_ids are listed Thanks a lot for the help! Ying Confidentiality Note:\ This e-mail, and any attachment t...{{dropped:11}}

• 1.1k views

ADD COMMENT • link 11.6 years ago Ying Chen ▴ 110

0

Entering edit mode

Ying Chen ▴ 110

@ying-chen-4756

Last seen 9.6 years ago

Sorry, one more question regarding the transcripts argument: For the following two transcripts: uc001aaa.3 chr1:11874-12227,12613-12721,13221-14409:+ uc010nxq.1 chr1:11874-12227,12595-12721,13403-14409:+ What should be the format for tx_start and tx_end? transcripts <- data.frame( tx_id=c("uc001aaa.3"," uc010nxq.1"), tx_chrom="chr1", tx_strand="+", tx_start=c(11874, 11874), tx_end=c(14409, 14409)) or: transcripts <- data.frame( tx_id=c("uc001aaa.3","uc001aaa.3","uc001aaa.3","uc010n xq.1","uc010nxq.1","uc010nxq.1"), tx_chrom="chr1", tx_strand="+", tx_start=c(11874,12613,13221,11874,12595,13403), tx_end=c(12227,12721,14409,12227,12721,14409)) Thanks a lot for the help! Ying -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Ying Chen Sent: Tuesday, October 09, 2012 10:57 AM To: bioconductor at r-project.org Subject: [BioC] How to set up genes argument for makeTranscriptDb( ) function in GenomicFeatures? Hi guys, I am new to the GenomicsFeatures package and I am trying to make my own txdb out of TCGA GAF file using the function makeTranscriptDb(transcripts, splicings,genes) in GenomicsFeatures. The manual says that genes is a data frame containing the genes associated to a set of transcripts. I just wonder what format should it be. Will the following work? genes <- data.frame( tx_id=c(txid1,txid2, txid3, txid4, txid5, txid6, txid7, txid8, txid9,............), # first 9 tx_ids for 3 gene_ids are listed gene_id=c(geneid1, geneid1, geneid2, geneid2, geneid2, geneid3, geneid3, geneid3, geneid3,.......)) # first 3 gene_ids are listed Thanks a lot for the help! Ying Confidentiality Note:\ This e-mail, and any attachment t...{{dropped:19}}

ADD COMMENT • link 11.6 years ago Ying Chen ▴ 110

0

Entering edit mode

Hi Ying, Since you are describing two transcripts, it should be the 1st one. The second would for describing the exons that go with these transcripts. But just out of curiousity, why are you starting with a GAF? I see you using UCSC IDs, so why not just use one of these known gene packages?: http://www.bioconductor.org/packages/release/BiocViews.html#___Transcr iptDb Marc On 10/09/2012 09:21 AM, Ying Chen wrote: > Sorry, one more question regarding the transcripts argument: > > For the following two transcripts: > > uc001aaa.3 chr1:11874-12227,12613-12721,13221-14409:+ > uc010nxq.1 chr1:11874-12227,12595-12721,13403-14409:+ > > What should be the format for tx_start and tx_end? > > transcripts<- data.frame( > tx_id=c("uc001aaa.3"," uc010nxq.1"), > tx_chrom="chr1", > tx_strand="+", > tx_start=c(11874, 11874), > tx_end=c(14409, 14409)) > > or: > > transcripts<- data.frame( > tx_id=c("uc001aaa.3","uc001aaa.3","uc001aaa.3","uc010n xq.1","uc010nxq.1","uc010nxq.1"), > tx_chrom="chr1", > tx_strand="+", > tx_start=c(11874,12613,13221,11874,12595,13403), > tx_end=c(12227,12721,14409,12227,12721,14409)) > > Thanks a lot for the help! > > Ying > > -----Original Message----- > From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Ying Chen > Sent: Tuesday, October 09, 2012 10:57 AM > To: bioconductor at r-project.org > Subject: [BioC] How to set up genes argument for makeTranscriptDb( ) function in GenomicFeatures? > > Hi guys, > > I am new to the GenomicsFeatures package and I am trying to make my own txdb out of TCGA GAF file using the function makeTranscriptDb(transcripts, splicings,genes) in GenomicsFeatures. The manual says that genes is a data frame containing the genes associated to a set of transcripts. I just wonder what format should it be. Will the following work? > > genes<- data.frame( > tx_id=c(txid1,txid2, txid3, txid4, txid5, txid6, txid7, txid8, txid9,............), # first 9 tx_ids for 3 gene_ids are listed > gene_id=c(geneid1, geneid1, geneid2, geneid2, geneid2, geneid3, geneid3, geneid3, geneid3,.......)) # first 3 gene_ids are listed > > Thanks a lot for the help! > > Ying > > Confidentiality Note:\ This e-mail, and any attachment t...{{dropped:19}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.6 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Hi Marc, Thanks a lot for the help. I start with GAF because I look at TCGA CRC data. Here was what I did initially: > library(GenomicFeatures) > library(GenomicRanges) > library(TxDb.Hsapiens.UCSC.hg19.knownGene) > TCGA_CRC <- read.delim("TCGA_CRC_UNC_RNASeq_ExonID_detail_new.txt",s tringsAsFactors=FALSE) > ex <- exons(TxDb.Hsapiens.UCSC.hg19.knownGene, columns = c("exon_id","gene_id","tx_id")) > TCGA_CRC <- GRanges(seqnames=TCGA_CRC$Chrom, ranges=IRanges(start=TC GA_CRC$Start,end=TCGA_CRC$End),strand=TCGA_CRC$Strand,name=TCGA_CRC$Ex onID) > elementMetadata(TCGA_CRC) <- elementMetadata(ex)[match(TCGA_CRC, ex),] Error in elementMetadata(ex)[match(TCGA_CRC, ex), ] : selecting rows: subscript contains NAs or out of bounds indices > > x <- which(!TCGA_CRC %in% ex) Returns a value larger than 3000. Apparently as Tim Triche pointed out about a month ago, the GAF made by gurus at UNC is different from UCSC.hg19.knownGene even most part are identical. With more and more TCGA sequencing datasets available, is there going to be some TxDbs for GAFs (for example https://tcga-dat a.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/other/GAF/G AF_bundle/outputs/TCGA.Sept2010.09202010.gaf)? Thanks a lot, Ying -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Marc Carlson Sent: Tuesday, October 09, 2012 12:57 PM To: bioconductor at r-project.org Subject: Re: [BioC] How to set up genes argument for makeTranscriptDb( ) function in GenomicFeatures? Hi Ying, Since you are describing two transcripts, it should be the 1st one. The second would for describing the exons that go with these transcripts. But just out of curiousity, why are you starting with a GAF? I see you using UCSC IDs, so why not just use one of these known gene packages?: http://www.bioconductor.org/packages/release/BiocViews.html#___Transcr iptDb Marc On 10/09/2012 09:21 AM, Ying Chen wrote: > Sorry, one more question regarding the transcripts argument: > > For the following two transcripts: > > uc001aaa.3 chr1:11874-12227,12613-12721,13221-14409:+ > uc010nxq.1 chr1:11874-12227,12595-12721,13403-14409:+ > > What should be the format for tx_start and tx_end? > > transcripts<- data.frame( > tx_id=c("uc001aaa.3"," uc010nxq.1"), > tx_chrom="chr1", > tx_strand="+", > tx_start=c(11874, 11874), > tx_end=c(14409, 14409)) > > or: > > transcripts<- data.frame( > tx_id=c("uc001aaa.3","uc001aaa.3","uc001aaa.3","uc010n xq.1","uc010nxq.1","uc010nxq.1"), > tx_chrom="chr1", > tx_strand="+", > tx_start=c(11874,12613,13221,11874,12595,13403), > tx_end=c(12227,12721,14409,12227,12721,14409)) > > Thanks a lot for the help! > > Ying > > -----Original Message----- > From: bioconductor-bounces at r-project.org > [mailto:bioconductor-bounces at r-project.org] On Behalf Of Ying Chen > Sent: Tuesday, October 09, 2012 10:57 AM > To: bioconductor at r-project.org > Subject: [BioC] How to set up genes argument for makeTranscriptDb( ) function in GenomicFeatures? > > Hi guys, > > I am new to the GenomicsFeatures package and I am trying to make my own txdb out of TCGA GAF file using the function makeTranscriptDb(transcripts, splicings,genes) in GenomicsFeatures. The manual says that genes is a data frame containing the genes associated to a set of transcripts. I just wonder what format should it be. Will the following work? > > genes<- data.frame( > tx_id=c(txid1,txid2, txid3, txid4, txid5, txid6, txid7, txid8, txid9,............), # first 9 tx_ids for 3 gene_ids are listed > gene_id=c(geneid1, geneid1, geneid2, geneid2, geneid2, geneid3, geneid3, geneid3, geneid3,.......)) # first 3 gene_ids are listed > > Thanks a lot for the help! > > Ying > > Confidentiality Note:\ This e-mail, and any attachment > t...{{dropped:19}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Confidentiality Note:\ This e-mail, and any attachment t...{{dropped:11}}

ADD REPLY • link 11.6 years ago Ying Chen ▴ 110

Login before adding your answer.