#### The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: converting a GRangesList object with different lengths into a data frame
0
17 months ago by
Assa Yeroslaviz1.4k
Munich, Germany
Assa Yeroslaviz1.4k wrote:

Following my summarize scores of GRanges into bins and advancing one stye at a time, I would now like to convert a GRangesList object into a data.frame, where each of the score columns (meta data columns) of the different GRanges in the list are seaprate columns in the data frame such as :

>tiles.list
GRangesList object of length 3:
$15S_rRNA GRanges object with 100 ranges and 1 metadata column: seqnames ranges strand | score <Rle> <IRanges> <Rle> | <numeric> 15S_rRNA.15S_rRNA MT [6546, 6561] * | 47.0025219774636 15S_rRNA.15S_rRNA MT [6562, 6577] * | 52.4692503895184 ... ... ... ... . ... 15S_rRNA.15S_rRNA MT [8162, 8177] * | 131.070537758245 15S_rRNA.15S_rRNA MT [8178, 8193] * | 133.993728100123$21S_rRNA
GRanges object with 100 ranges and 1 metadata column:
seqnames         ranges strand |            score
<Rle>      <IRanges>  <Rle> |        <numeric>
21S_rRNA.21S_rRNA       MT [58009, 58052]      * |   11.61435429513
21S_rRNA.21S_rRNA       MT [58053, 58096]      * | 13.9056586769545
...      ...            ...    ... .              ...
21S_rRNA.21S_rRNA       MT [62359, 62402]      * | 65.9285146503723
21S_rRNA.21S_rRNA       MT [62403, 62447]      * | 113.348199738504
$YAL037C-A GRanges object with 93 ranges and 1 metadata column: seqnames ranges strand | score <Rle> <IRanges> <Rle> | <numeric> YAL037C-A.YAL037C-A I [73426, 73426] * | 242.417848776282 YAL037C-A.YAL037C-A I [73427, 73427] * | 246.146507583353 ... ... ... ... . ... YAL037C-A.YAL037C-A I [73517, 73517] * | 221.726874447293 YAL037C-A.YAL037C-A I [73518, 73518] * | 220.070233632405 ------- seqinfo: 17 sequences from an unspecified genome; no seqlengths Each of the GRanges in the GRangesList object has a meta data column with scores. I would like to convert this list into a matrix, where in the columns I have the scores and the row names are numbered 1-100 so it should look like that:  15S_rRNA 21S_rRNA YAL037C-A 1 47.0025219774636 11.61435429513 242.417848776282 2 52.4692503895184 13.9056586769545 246.146507583353 ... 99 131.070537758245 65.9285146503723 NA 100 133.993728100123 113.348199738504 NA The last GRanges Objwct which has only 93 ranges should have NA (or 0 ) instead, when converting the data.frame. I know how to do it when they are all 100 ranges with (for example) do.call(cbind.data.frame, tiles.list) and than delete the unwanted columns, but how do I combine a list of GRanges with different lengths into one big data frame? Any help would be appreciated. Thanks Assa P.S. The dput(tiles.tiles) can be found here ADD COMMENTlink modified 17 months ago by Michael Lawrence10k • written 17 months ago by Assa Yeroslaviz1.4k Answer: converting a GRangesList object with different lengths into a data frame 4 17 months ago by United States Michael Lawrence10k wrote: I don't immediately see how arranging the data in this way is useful. But the best way would be to coerce to data.frame and then use reshape() to move to wide form. I guess the tricky part is getting a variable representing the subscript within each GRanges. I've called that "row" below. df <- as.data.frame(tiles.list) df$row <- as.integer(IRanges(1L, width=lengths(tiles.list)))
wide <- reshape(df[c("row", "group_name", "score")], direction="wide",
timevar="group_name", idvar="row")

Thanks Michael, this is really smooth. I know this is a weird presentation of the data. I need this big data.frame of scores to be able to plot (either as a heat-map or lines plot) the gene intensities on top of each other. For that reason I needed the "gene lengths" to be identical. On my X-axis I have the gene positions (in my case it would be 1-100) and on the Y-axis I have the intensities ( in my case the averaged scores per region).

Unfortunately I couldn't find a better way of plotting the gene intensities over all genes per sample

The idea is to get something similar to this one here:

1

Ok. I think you could make a plot like the above using the long form. Certainly in ggplot2 or lattice, and probably in base. One issue that may not apply in your case is splicing. The simple code above will not handle the case of an intron within the first 100 bp. For that, you'll want to look into pmapToTranscripts().

Thanks for  the suggestion of this function. It is worth knowing for later cases. I know about the problem of exon, luckily we are working on S. cerevisiae and have no introns problem, as we are interested in the complete transcript. But this function looks very interesting.

Answer: converting a GRangesList object with different lengths into a data frame
1
17 months ago by
Marcel Ramos ♦♦ 350
United States
Marcel Ramos ♦♦ 350 wrote:

Hi Assa Yeroslaviz,

If you collapse into a single data.frame, each row will represent a different genomic location. You may not want this.

Nevertheless, if you do want to go ahead and do this, you can try this:

# Take all the score values
scoreList <- lapply(tiles.list, function(x) mcols(x))
# Impute NA
scoreList[[3]][94:100, ] <- NA
# Bind into DataFrame
Reduce(cbind, scoreList)


I suggest the use of RaggedExperiment for matrix representation of ragged metadata columns. This will take into account any matching row ranges in your data.

library(RaggedExperiment)
# Convert GRangesList to RaggedExperiment
ragTile <- RaggedExperiment(tiles.list)
# Create matrix of all values across GRangesList elements
assay(ragTile, i = "score")
# Combine if possible, any matching ranges
compactAssay(ragTile, i = "score")


In this case, there are no matching ranges across the elements of the GRangesList.

Best Regards, Marcel

Thanks Marcel for the suggestion of RaggedExperiment, But this is not what i needed, as I already know, that there are no common regions. This is not what I am looking for here. The first option I have already thought of. In my case I have over 3000 genomic regions, many of them have an identical length, other have different lengths, so I can't set it to a specific number as you did. I have managed already to change it to data.frame and reduce()-cbind() the data into one big data.frame. But I was hoping for a more efficient method