bed file around the TSSs
1
0
Entering edit mode
joseph ▴ 330
@joseph-1270
Last seen 11.1 years ago
Hi I would highly appreciate if you could show me how to create a bed file around the TSSs from UCSC databases such as ensembl or refSeq genes. I need 350 nucleotides upstream and 150 nucleotides downstream of TSSs. The bed file should look like below, where: chromStart is 350 nucleotides upstream of TSS chromEnd is 150 nucleotides downstream of TSS name is Name of gene or transcript_id depending on the database. chrom chromStart chromEnd name score strand chr1 67051159 67163158 NM_024763 0 - chr1 67075869 67163158 NM_207014 0 + chr1 16762998 16812569 NM_017940 0 - Thanks Joseph [[alternative HTML version deleted]]
• 3.0k views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 5 weeks ago
United States
There are various approaches but basic tools for one solution are 1) use GenomicFeatures::makeTranscriptTableFromUCSC(genome, table) to acquire a SQLite database that has relevant information 2) extract the GRanges object corresponding to transcripts and their genomic coordinates using the transcripts() method 3) set the names of the GRanges using something like names(gr) = elementMetadata(gr)$tx_name -- but see below; you might instead coerce to RangedData and just change the column names 4) revise the coordinate information according to your wishes 5) use the rtracklayer export method on the revised GRanges or RangedData structure to create the bed file some concrete results, based on > sessionInfo() R version 2.13.0 Under development (unstable) (2010-10-12 r53293) Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices datasets tools utils methods [8] base other attached packages: [1] rtracklayer_1.9.9 RCurl_1.4-3 bitops_1.0-4.1 [4] GenomicFeatures_1.1.12 GenomicRanges_1.1.29 IRanges_1.7.35 [7] weaver_1.15.0 codetools_0.2-2 digest_0.4.2 loaded via a namespace (and not attached): [1] BSgenome_1.17.7 Biobase_2.9.2 Biostrings_2.17.47 DBI_0.2-5 [5] RSQLite_0.9-2 XML_3.1-1 biomaRt_2.5.1 > t1 = makeTranscriptDbFromUCSC("hg19", "refGene") Download the refGene table ... OK Download the refLink table ... OK Extract the 'transcripts' data frame ... OK Extract the 'splicings' data frame ... OK Download and preprocess the 'chrominfo' data frame ... OK Prepare the 'metadata' data frame ... OK Make the TranscriptDb object ... OK There were 50 or more warnings (use warnings() to see the first 50) > tt1 = transcripts(t1) > tt1 GRanges with 37195 ranges and 2 elementMetadata values seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr1 [ 69091, 70008] + | 980 NM_001005484 [2] chr1 [323892, 328581] + | 985 NR_028327 [3] chr1 [323892, 328581] + | 986 NR_028322 [4] chr1 [323892, 328581] + | 987 NR_028325 [5] chr1 [367659, 368597] + | 984 NM_001005277 > names(tt1) = make.names(elementMetadata(tt1)$tx_name, unique=TRUE) # why? > tt1[1:5] GRanges with 5 ranges and 2 elementMetadata values seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> NM_001005484 chr1 [ 69091, 70008] + | 980 NM_001005484 NR_028327 chr1 [323892, 328581] + | 985 NR_028327 NR_028322 chr1 [323892, 328581] + | 986 NR_028322 NR_028325 chr1 [323892, 328581] + | 987 NR_028325 NM_001005277 chr1 [367659, 368597] + | 984 NM_001005277 seqlengths chr1 chr2 ... chr18_gl000207_random 249250621 243199373 ... 4262 observe what happens when the tx_names are duplicated > tt1[42:44] GRanges with 3 ranges and 2 elementMetadata values seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> NR_002946 chr1 [1568159, 1570027] + | 1053 NR_002946 NR_002946.1 chr1 [1631378, 1633247] + | 1063 NR_002946 NM_138705 chr1 [1846266, 1848733] + | 1066 NM_138705 seqlengths chr1 chr2 ... chr18_gl000207_random 249250621 243199373 ... 4262 if you coerce to RangedData and change tx_name to name you can probably avoid the munging of the transcript names, which in this case, if you use the approach above, could be pretty confusing if .1 etc are used to indicate versions at refseq > export(tt1, "tt1.bed") > cat(readLines("tt1.bed", n=5), sep="\n") chr1 69090 70008 NM_001005484 0 + chr1 323891 328581 NR_028327 0 + chr1 323891 328581 NR_028322 0 + chr1 323891 328581 NR_028325 0 + chr1 367658 368597 NM_001005277 0 + i have skipped the modifications to the coordinates (you will need to program that conditional on strand, presumably) and the setting of the header line. On Wed, Oct 13, 2010 at 3:53 PM, joseph <jdsandjd at="" yahoo.com=""> wrote: > Hi > I would highly appreciate if you could show me how to create a bed file around > the TSSs from UCSC databases such as ensembl or refSeq genes. I need 350 > nucleotides upstream and 150 nucleotides downstream of TSSs. The bed file should > look like below, where: > chromStart is 350 nucleotides upstream of TSS > chromEnd is 150 nucleotides downstream of TSS > name is Name of gene or transcript_id depending on the database. > > > > chrom ? ?chromStart ? ?chromEnd ? ?name ? ? ? ?score ? ?strand > chr1 ? ?67051159 ? ?67163158 ? ?NM_024763 ? ?0 ? ?- > chr1 ? ?67075869 ? ?67163158 ? ?NM_207014 ? ?0 ? ?+ > chr1 ? ?16762998 ? ?16812569 ? ?NM_017940 ? ?0 ? ?- > > Thanks > Joseph > > > > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
On Wed, Oct 13, 2010 at 2:05 PM, Vincent Carey <stvjc@channing.harvard.edu>wrote: > There are various approaches but basic tools for one solution are > 1) use GenomicFeatures::makeTranscriptTableFromUCSC(genome, table) to > acquire > a SQLite database that has relevant information > 2) extract the GRanges object corresponding to transcripts and their > genomic coordinates > using the transcripts() method > 3) set the names of the GRanges using something like names(gr) = > elementMetadata(gr)$tx_name -- but see below; you might instead coerce > to RangedData and just change the column names > 4) revise the coordinate information according to your wishes > Thanks Vince for laying this out so nicely. For this case, the flank() function in IRanges should make this step #4 easy. 5) use the rtracklayer export method on the revised GRanges or > RangedData structure to create the bed file > > And if you're making the BED file just to upload it to UCSC, consider using rtracklayer to do that directly. Michael some concrete results, based on > > > sessionInfo() > R version 2.13.0 Under development (unstable) (2010-10-12 r53293) > Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices datasets tools utils methods > [8] base > > other attached packages: > [1] rtracklayer_1.9.9 RCurl_1.4-3 bitops_1.0-4.1 > [4] GenomicFeatures_1.1.12 GenomicRanges_1.1.29 IRanges_1.7.35 > [7] weaver_1.15.0 codetools_0.2-2 digest_0.4.2 > > loaded via a namespace (and not attached): > [1] BSgenome_1.17.7 Biobase_2.9.2 Biostrings_2.17.47 DBI_0.2-5 > [5] RSQLite_0.9-2 XML_3.1-1 biomaRt_2.5.1 > > > t1 = makeTranscriptDbFromUCSC("hg19", "refGene") > Download the refGene table ... OK > Download the refLink table ... OK > Extract the 'transcripts' data frame ... OK > Extract the 'splicings' data frame ... OK > Download and preprocess the 'chrominfo' data frame ... OK > Prepare the 'metadata' data frame ... OK > Make the TranscriptDb object ... OK > There were 50 or more warnings (use warnings() to see the first 50) > > tt1 = transcripts(t1) > > tt1 > GRanges with 37195 ranges and 2 elementMetadata values > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > [1] chr1 [ 69091, 70008] + | 980 NM_001005484 > [2] chr1 [323892, 328581] + | 985 NR_028327 > [3] chr1 [323892, 328581] + | 986 NR_028322 > [4] chr1 [323892, 328581] + | 987 NR_028325 > [5] chr1 [367659, 368597] + | 984 NM_001005277 > > > names(tt1) = make.names(elementMetadata(tt1)$tx_name, unique=TRUE) # > why? > > tt1[1:5] > GRanges with 5 ranges and 2 elementMetadata values > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > NM_001005484 chr1 [ 69091, 70008] + | 980 NM_001005484 > NR_028327 chr1 [323892, 328581] + | 985 NR_028327 > NR_028322 chr1 [323892, 328581] + | 986 NR_028322 > NR_028325 chr1 [323892, 328581] + | 987 NR_028325 > NM_001005277 chr1 [367659, 368597] + | 984 NM_001005277 > > seqlengths > chr1 chr2 ... chr18_gl000207_random > 249250621 243199373 ... 4262 > > observe what happens when the tx_names are duplicated > > > tt1[42:44] > GRanges with 3 ranges and 2 elementMetadata values > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > NR_002946 chr1 [1568159, 1570027] + | 1053 NR_002946 > NR_002946.1 chr1 [1631378, 1633247] + | 1063 NR_002946 > NM_138705 chr1 [1846266, 1848733] + | 1066 NM_138705 > > seqlengths > chr1 chr2 ... chr18_gl000207_random > 249250621 243199373 ... 4262 > > if you coerce to RangedData and change tx_name to name you can > probably avoid the munging of the transcript names, > which in this case, if you use the approach above, could be pretty > confusing if .1 etc are used to indicate versions at refseq > > > export(tt1, "tt1.bed") > > cat(readLines("tt1.bed", n=5), sep="\n") > chr1 69090 70008 NM_001005484 0 + > chr1 323891 328581 NR_028327 0 + > chr1 323891 328581 NR_028322 0 + > chr1 323891 328581 NR_028325 0 + > chr1 367658 368597 NM_001005277 0 + > > i have skipped the modifications to the coordinates (you will need to > program > that conditional on strand, presumably) and the setting of the header line. > > > > > On Wed, Oct 13, 2010 at 3:53 PM, joseph <jdsandjd@yahoo.com> wrote: > > Hi > > I would highly appreciate if you could show me how to create a bed file > around > > the TSSs from UCSC databases such as ensembl or refSeq genes. I need 350 > > nucleotides upstream and 150 nucleotides downstream of TSSs. The bed file > should > > look like below, where: > > chromStart is 350 nucleotides upstream of TSS > > chromEnd is 150 nucleotides downstream of TSS > > name is Name of gene or transcript_id depending on the database. > > > > > > > > chrom chromStart chromEnd name score strand > > chr1 67051159 67163158 NM_024763 0 - > > chr1 67075869 67163158 NM_207014 0 + > > chr1 16762998 16812569 NM_017940 0 - > > > > Thanks > > Joseph > > > > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Vincent Thank you, that was very helpful. Can you please expand a bit more on step#3 about coercing to RangedData and changing tx_name to name? Joseph ________________________________ From: Vincent Carey <stvjc@channing.harvard.edu> Cc: bioconductor@stat.math.ethz.ch Sent: Wed, October 13, 2010 2:05:31 PM Subject: Re: [BioC] bed file around the TSSs There are various approaches but basic tools for one solution are 1) use GenomicFeatures::makeTranscriptTableFromUCSC(genome, table) to acquire a SQLite database that has relevant information 2) extract the GRanges object corresponding to transcripts and their genomic coordinates using the transcripts() method 3) set the names of the GRanges using something like names(gr) = elementMetadata(gr)$tx_name -- but see below; you might instead coerce to RangedData and just change the column names 4) revise the coordinate information according to your wishes 5) use the rtracklayer export method on the revised GRanges or RangedData structure to create the bed file some concrete results, based on > sessionInfo() R version 2.13.0 Under development (unstable) (2010-10-12 r53293) Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices datasets tools utils methods [8] base other attached packages: [1] rtracklayer_1.9.9 RCurl_1.4-3 bitops_1.0-4.1 [4] GenomicFeatures_1.1.12 GenomicRanges_1.1.29 IRanges_1.7.35 [7] weaver_1.15.0 codetools_0.2-2 digest_0.4.2 loaded via a namespace (and not attached): [1] BSgenome_1.17.7 Biobase_2.9.2 Biostrings_2.17.47 DBI_0.2-5 [5] RSQLite_0.9-2 XML_3.1-1 biomaRt_2.5.1 > t1 = makeTranscriptDbFromUCSC("hg19", "refGene") Download the refGene table ... OK Download the refLink table ... OK Extract the 'transcripts' data frame ... OK Extract the 'splicings' data frame ... OK Download and preprocess the 'chrominfo' data frame ... OK Prepare the 'metadata' data frame ... OK Make the TranscriptDb object ... OK There were 50 or more warnings (use warnings() to see the first 50) > tt1 = transcripts(t1) > tt1 GRanges with 37195 ranges and 2 elementMetadata values seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr1 [ 69091, 70008] + | 980 NM_001005484 [2] chr1 [323892, 328581] + | 985 NR_028327 [3] chr1 [323892, 328581] + | 986 NR_028322 [4] chr1 [323892, 328581] + | 987 NR_028325 [5] chr1 [367659, 368597] + | 984 NM_001005277 > names(tt1) = make.names(elementMetadata(tt1)$tx_name, unique=TRUE) # why? > tt1[1:5] GRanges with 5 ranges and 2 elementMetadata values seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> NM_001005484 chr1 [ 69091, 70008] + | 980 NM_001005484 NR_028327 chr1 [323892, 328581] + | 985 NR_028327 NR_028322 chr1 [323892, 328581] + | 986 NR_028322 NR_028325 chr1 [323892, 328581] + | 987 NR_028325 NM_001005277 chr1 [367659, 368597] + | 984 NM_001005277 seqlengths chr1 chr2 ... chr18_gl000207_random 249250621 243199373 ... 4262 observe what happens when the tx_names are duplicated > tt1[42:44] GRanges with 3 ranges and 2 elementMetadata values seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> NR_002946 chr1 [1568159, 1570027] + | 1053 NR_002946 NR_002946.1 chr1 [1631378, 1633247] + | 1063 NR_002946 NM_138705 chr1 [1846266, 1848733] + | 1066 NM_138705 seqlengths chr1 chr2 ... chr18_gl000207_random 249250621 243199373 ... 4262 if you coerce to RangedData and change tx_name to name you can probably avoid the munging of the transcript names, which in this case, if you use the approach above, could be pretty confusing if .1 etc are used to indicate versions at refseq > export(tt1, "tt1.bed") > cat(readLines("tt1.bed", n=5), sep="\n") chr1 69090 70008 NM_001005484 0 + chr1 323891 328581 NR_028327 0 + chr1 323891 328581 NR_028322 0 + chr1 323891 328581 NR_028325 0 + chr1 367658 368597 NM_001005277 0 + i have skipped the modifications to the coordinates (you will need to program that conditional on strand, presumably) and the setting of the header line. > Hi > I would highly appreciate if you could show me how to create a bed file around > the TSSs from UCSC databases such as ensembl or refSeq genes. I need 350 > nucleotides upstream and 150 nucleotides downstream of TSSs. The bed file >should > look like below, where: > chromStart is 350 nucleotides upstream of TSS > chromEnd is 150 nucleotides downstream of TSS > name is Name of gene or transcript_id depending on the database. > > > > chrom chromStart chromEnd name score strand > chr1 67051159 67163158 NM_024763 0 - > chr1 67075869 67163158 NM_207014 0 + > chr1 16762998 16812569 NM_017940 0 - > > Thanks > Joseph > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
as usual my suggestions were a bit more involved than necessary. the point is that the thing you get from the transcripts() method applied to a makeTranscriptsDb will have "tx_name" as the name of the elementMetadata and this metadata will not be propagated by export(); if it had name "name", it would be propagated. so > f19 = loadFeatures("hg19.txdb.sqlite") # serialized via saveFeatures, but perhaps a long time ago... so the message: The TranscriptDb object has been updated to the latest schema version 1.0. The updated object can be saved using saveFeatures() > t19 = transcripts(f19) # get transcripts GRanges > names(elementMetadata(t19)) # what are the metadata names [1] "tx_id" "tx_name" > names(elementMetadata(t19))[2] = "name" # reset a name > export(t19, "t19.bed") # do not use file= > cat(readLines("t19.bed", n=5), sep="\n") # what you asked for, minus the header line chr1 11873 14409 uc001aaa.3 0 + chr1 11873 14409 uc010nxq.1 0 + chr1 11873 14409 uc010nxr.1 0 + chr1 69090 70008 uc001aal.1 0 + chr1 321083 321114 uc001aaq.1 0 + On Sat, Oct 16, 2010 at 5:10 PM, joseph <jdsandjd at="" yahoo.com=""> wrote: > Hi Vincent > Thank you, that was very helpful. > Can you please expand a bit more? on step#3 about coercing to RangedData and > changing tx_name to name? > Joseph > > ________________________________ > From: Vincent Carey <stvjc at="" channing.harvard.edu=""> > To: joseph <jdsandjd at="" yahoo.com=""> > Cc: bioconductor at stat.math.ethz.ch > Sent: Wed, October 13, 2010 2:05:31 PM > Subject: Re: [BioC] bed file around the TSSs > > There are various approaches but basic tools for one solution are > 1) use GenomicFeatures::makeTranscriptTableFromUCSC(genome, table) to > acquire > a SQLite database that has relevant information > 2) extract the GRanges object corresponding to transcripts and their > genomic coordinates > using the transcripts() method > 3) set the names of the GRanges using something like names(gr) = > elementMetadata(gr)$tx_name -- but see below; you might instead coerce > to RangedData and just change the column names > 4) revise the coordinate information according to your wishes > 5) use the rtracklayer export method on the revised GRanges or > RangedData structure to create the bed file > > some concrete results, based on > >> sessionInfo() > R version 2.13.0 Under development (unstable) (2010-10-12 r53293) > Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats? ? graphics? grDevices datasets? tools? ? utils? ? methods > [8] base > > other attached packages: > [1] rtracklayer_1.9.9? ? ? RCurl_1.4-3? ? ? ? ? ? bitops_1.0-4.1 > [4] GenomicFeatures_1.1.12 GenomicRanges_1.1.29? IRanges_1.7.35 > [7] weaver_1.15.0? ? ? ? ? codetools_0.2-2? ? ? ? digest_0.4.2 > > loaded via a namespace (and not attached): > [1] BSgenome_1.17.7? ? Biobase_2.9.2? ? ? Biostrings_2.17.47 DBI_0.2-5 > [5] RSQLite_0.9-2? ? ? XML_3.1-1? ? ? ? ? biomaRt_2.5.1 > >> t1 = makeTranscriptDbFromUCSC("hg19", "refGene") > Download the refGene table ... OK > Download the refLink table ... OK > Extract the 'transcripts' data frame ... OK > Extract the 'splicings' data frame ... OK > Download and preprocess the 'chrominfo' data frame ... OK > Prepare the 'metadata' data frame ... OK > Make the TranscriptDb object ... OK > There were 50 or more warnings (use warnings() to see the first 50) >> tt1 = transcripts(t1) >> tt1 > GRanges with 37195 ranges and 2 elementMetadata values > ? ? ? ? seqnames? ? ? ? ? ? ? ranges strand? |? ? tx_id? ? ? tx_name > ? ? ? ? ? <rle>? ? ? ? ? ? <iranges>? <rle>? | <integer>? <character> > ? ? [1]? ? chr1? ? [ 69091,? 70008]? ? ? +? |? ? ? 980 NM_001005484 > ? ? [2]? ? chr1? ? [323892, 328581]? ? ? +? |? ? ? 985? ? NR_028327 > ? ? [3]? ? chr1? ? [323892, 328581]? ? ? +? |? ? ? 986? ? NR_028322 > ? ? [4]? ? chr1? ? [323892, 328581]? ? ? +? |? ? ? 987? ? NR_028325 > ? ? [5]? ? chr1? ? [367659, 368597]? ? ? +? |? ? ? 984 NM_001005277 > >> names(tt1) = make.names(elementMetadata(tt1)$tx_name, unique=TRUE)? # why? >> tt1[1:5] > GRanges with 5 ranges and 2 elementMetadata values > ? ? ? ? ? ? seqnames? ? ? ? ? ranges strand |? ? tx_id? ? ? tx_name > ? ? ? ? ? ? ? ? <rle>? ? ? ? <iranges>? <rle> | <integer>? <character> > NM_001005484? ? chr1 [ 69091,? 70008]? ? ? + |? ? ? 980 NM_001005484 > ? NR_028327? ? chr1 [323892, 328581]? ? ? + |? ? ? 985? ? NR_028327 > ? NR_028322? ? chr1 [323892, 328581]? ? ? + |? ? ? 986? ? NR_028322 > ? NR_028325? ? chr1 [323892, 328581]? ? ? + |? ? ? 987? ? NR_028325 > NM_001005277? ? chr1 [367659, 368597]? ? ? + |? ? ? 984 NM_001005277 > > seqlengths > ? ? ? ? ? ? ? ? ? chr1? ? ? ? ? ? ? ? ? chr2 ... chr18_gl000207_random > ? ? ? ? ? ? 249250621? ? ? ? ? ? 243199373 ...? ? ? ? ? ? ? ? ? 4262 > > observe what happens when the tx_names are duplicated > >> tt1[42:44] > GRanges with 3 ranges and 2 elementMetadata values > ? ? ? ? ? ? seqnames? ? ? ? ? ? ranges strand |? ? tx_id? ? tx_name > ? ? ? ? ? ? ? <rle>? ? ? ? ? <iranges>? <rle> | <integer> <character> > ? NR_002946? ? chr1 [1568159, 1570027]? ? ? + |? ? ? 1053? NR_002946 > NR_002946.1? ? chr1 [1631378, 1633247]? ? ? + |? ? ? 1063? NR_002946 > ? NM_138705? ? chr1 [1846266, 1848733]? ? ? + |? ? ? 1066? NM_138705 > > seqlengths > ? ? ? ? ? ? ? ? ? chr1? ? ? ? ? ? ? ? ? chr2 ... chr18_gl000207_random > ? ? ? ? ? ? 249250621? ? ? ? ? ? 243199373 ...? ? ? ? ? ? ? ? ? 4262 > > if you coerce to RangedData and change tx_name to name you can > probably avoid the munging of the transcript names, > which in this case, if you use the approach above, could be pretty > confusing if .1 etc are used to indicate versions at refseq > >> export(tt1, "tt1.bed") >> cat(readLines("tt1.bed", n=5), sep="\n") > chr1? ? 69090? 70008? NM_001005484? ? 0? ? ? + > chr1? ? 323891? 328581? NR_028327? ? ? 0? ? ? + > chr1? ? 323891? 328581? NR_028322? ? ? 0? ? ? + > chr1? ? 323891? 328581? NR_028325? ? ? 0? ? ? + > chr1? ? 367658? 368597? NM_001005277? ? 0? ? ? + > > i have skipped the modifications to the coordinates (you will need to > program > that conditional on strand, presumably) and the setting of the header line. > > > > > On Wed, Oct 13, 2010 at 3:53 PM, joseph <jdsandjd at="" yahoo.com=""> wrote: >> Hi >> I would highly appreciate if you could show me how to create a bed file >> around >> the TSSs from UCSC databases such as ensembl or refSeq genes. I need 350 >> nucleotides upstream and 150 nucleotides downstream of TSSs. The bed file >> should >> look like below, where: >> chromStart is 350 nucleotides upstream of TSS >> chromEnd is 150 nucleotides downstream of TSS >> name is Name of gene or transcript_id depending on the database. >> >> >> >> chrom ? ?chromStart ? ?chromEnd ? ?name ? ? ? ?score ? ?strand >> chr1 ? ?67051159 ? ?67163158 ? ?NM_024763 ? ?0 ? ?- >> chr1 ? ?67075869 ? ?67163158 ? ?NM_207014 ? ?0 ? ?+ >> chr1 ? ?16762998 ? ?16812569 ? ?NM_017940 ? ?0 ? ?- >> >> Thanks >> Joseph >> >> >> >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >
ADD REPLY

Login before adding your answer.

Traffic: 1017 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6