Following my summarize scores of GRanges into bins and advancing one stye at a time, I would now like to convert a GRangesList
object into a data.frame,
where each of the score columns (meta data columns) of the different GRanges in the list are seaprate columns in the data frame such as :
>tiles.list GRangesList object of length 3: $15S_rRNA GRanges object with 100 ranges and 1 metadata column: seqnames ranges strand | score <Rle> <IRanges> <Rle> | <numeric> 15S_rRNA.15S_rRNA MT [6546, 6561] * | 47.0025219774636 15S_rRNA.15S_rRNA MT [6562, 6577] * | 52.4692503895184 ... ... ... ... . ... 15S_rRNA.15S_rRNA MT [8162, 8177] * | 131.070537758245 15S_rRNA.15S_rRNA MT [8178, 8193] * | 133.993728100123 $21S_rRNA GRanges object with 100 ranges and 1 metadata column: seqnames ranges strand | score <Rle> <IRanges> <Rle> | <numeric> 21S_rRNA.21S_rRNA MT [58009, 58052] * | 11.61435429513 21S_rRNA.21S_rRNA MT [58053, 58096] * | 13.9056586769545 ... ... ... ... . ... 21S_rRNA.21S_rRNA MT [62359, 62402] * | 65.9285146503723 21S_rRNA.21S_rRNA MT [62403, 62447] * | 113.348199738504 $YAL037C-A GRanges object with 93 ranges and 1 metadata column: seqnames ranges strand | score <Rle> <IRanges> <Rle> | <numeric> YAL037C-A.YAL037C-A I [73426, 73426] * | 242.417848776282 YAL037C-A.YAL037C-A I [73427, 73427] * | 246.146507583353 ... ... ... ... . ... YAL037C-A.YAL037C-A I [73517, 73517] * | 221.726874447293 YAL037C-A.YAL037C-A I [73518, 73518] * | 220.070233632405 ------- seqinfo: 17 sequences from an unspecified genome; no seqlengths
Each of the GRanges
in the GRangesList
object has a meta data column with scores. I would like to convert this list into a matrix, where in the columns I have the scores and the row names are numbered 1-100 so it should look like that:
15S_rRNA 21S_rRNA YAL037C-A 1 47.0025219774636 11.61435429513 242.417848776282 2 52.4692503895184 13.9056586769545 246.146507583353 ... 99 131.070537758245 65.9285146503723 NA 100 133.993728100123 113.348199738504 NA
The last GRanges Objwct which has only 93 ranges should have NA (or 0 ) instead, when converting the data.frame.
I know how to do it when they are all 100 ranges with (for example) do.call(cbind.data.frame, tiles.list)
and than delete the unwanted columns, but how do I combine a list of GRanges with different lengths into one big data frame?
Any help would be appreciated.
Thanks Assa
P.S.
Thanks Michael, this is really smooth. I know this is a weird presentation of the data. I need this big data.frame of scores to be able to plot (either as a heat-map or lines plot) the gene intensities on top of each other. For that reason I needed the "gene lengths" to be identical. On my X-axis I have the gene positions (in my case it would be 1-100) and on the Y-axis I have the intensities ( in my case the averaged scores per region).
Unfortunately I couldn't find a better way of plotting the gene intensities over all genes per sample
The idea is to get something similar to this one here:
Ok. I think you could make a plot like the above using the long form. Certainly in ggplot2 or lattice, and probably in base. One issue that may not apply in your case is splicing. The simple code above will not handle the case of an intron within the first 100 bp. For that, you'll want to look into
pmapToTranscripts()
.Thanks for the suggestion of this function. It is worth knowing for later cases. I know about the problem of exon, luckily we are working on S. cerevisiae and have no introns problem, as we are interested in the complete transcript. But this function looks very interesting.