I used DiffBind to get a GRange object with differentially bound genomic sites. This GRange also contains several numeric columns with useful statistics:
GRanges object with 945 ranges and 6 metadata columns:
seqnames ranges strand | Conc Conc_NoSalt Conc_PostSalt_0 Fold p-value FDR
<Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
5309 chrXIII 482721-482821 * | 8.72 7.35 9.41 -2.06 2.92e-42 2.13e-38
2432 chrVI 107743-107843 * | 9.19 8.00 9.84 -1.83 8.45e-38 3.09e-34
6320 chrXV 430498-430598 * | 8.53 7.21 9.20 -2.00 1.57e-36 3.82e-33
4835 chrXII 783983-784083 * | 9.01 7.92 9.62 -1.71 7.88e-33 1.44e-29
6286 chrXV 383445-383545 * | 9.51 8.56 10.08 -1.52 1.13e-28 1.66e-25
... ... ... ... . ... ... ... ... ... ...
1873 chrIX 247430-247530 * | 8.71 8.91 8.47 0.43 0.00634 0.0492
4284 chrXI 483176-483276 * | 9.37 9.19 9.52 -0.33 0.00637 0.0494
5806 chrXIV 392558-392658 * | 9.10 8.88 9.28 -0.40 0.00638 0.0494
3376 chrVIII 271580-271680 * | 7.44 7.76 7.03 0.73 0.00643 0.0498
6262 chrXV 349087-349187 * | 8.69 8.47 8.88 -0.41 0.00645 0.0499
-------
seqinfo: 16 sequences from an unspecified genome; no seqlengths
Next, I use GenomicFeatures::cdsByOverlaps to retrieve meaningful annotations about the identified genomic regions (such as Gene Name) with the following bit of code:
HitsCDS <- cdsByOverlaps(TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, Salt.DB,
maxgap = -1L, minoverlap = 0L,
type = c("any", "start", "end"),
columns = c("tx_id", "tx_name"))
However, the new GRange object lacks the statistics columns, and I haven't found a way to include them in the output (or merge them with the output):
GRanges object with 289 ranges and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <IntegerList> <CharacterList>
[1] chrI 31567-32940 + | 7 YAL062W
[2] chrI 71786-73288 + | 20 YAL038W
[3] chrI 169375-170295 + | 44 YAR015W
[4] chrI 42881-45022 - | 69 YAL054C
[5] chrI 139503-141431 - | 103 YAL005C
... ... ... ... . ... ...
[285] chrXVI 697147-698265 - | 6599 YPR078C
[286] chrXVI 702114-703970 - | 6600 YPR081C
[287] chrXVI 813156-814058 - | 6627 YPR139C
[288] chrXVI 824690-824926 - | 6631 YPR145C-A
[289] chrXVI 825347-825676 - | 6632 YPR146C
-------
seqinfo: 17 sequences (1 circular) from sacCer3 genome
I tried simply adding additional column names, such as in the following, but this throws an error. I'm guessing because this references the Tx.Db GRange and not my DiffBind GRange?
columns = c("tx_id", "tx_name", "Conc"))
I'm sure there is a simple way to accomplish this, but I'm fairly inexperienced with R, GRanges, Bioconductor and haven't found a solution. Thanks for any help!
Thanks James, this works for me!
Cool. I know nothing about yeast, so this might not be applicable, but there is a
promoters
function that gives you the region around the transcription start site, which might be relevant to what you are doing (assuming some sort of ChIP-seq for transcription factor binding?)I actually use the promoter function, but unfortunately can only define the promoter regions based on position relative to the transcription start site. The yeast genome is fairly compact, so there are many instances where a reasonably promoter length would actually run into an upstream transcript.
Do you know if there is a way to define promoters from an annotation file? I'm thinking of making a set of rules to define promoters in a gene specific manner, which would result in an annotation file with both transcripts and promoters that would need to be made into a TxDb object (or perhaps something else from a different package?)
Thanks!
Thanks for bringing this use case to our attention. It's currently not possible to store an explicit set of promoters within a TxDb package. The easiest work around would be a separate BED file of promoters with names corresponding to the genes.
I can make the bed file pretty easily. Can you give me piece of example code for how I would reference the bed file that contains the promoter annotations?
For instance, what object type is "promoters" in the following:
If I could import the bed file with my promoter annotations and save it as a similar object-type in R, I think I could use the following code to find peak-overlaps with the promoters:
Think that would work?
Thanks!
Just use
rtracklayer::import()
to read the BED as a GRanges.