Question

Weaknesses with GRanges data type, or just me ?

0

Entering edit mode

hauken_heyken ▴ 80

@hauken_heyken-13992

Last seen 18 months ago

Bergen

So a GRangeslist have group and group_name, but when you do this conversion, the info is lost:

Grangeslist -> data.frame -> GRanges ->GrangesList

Is there a way to preserve the information ?

I tried to save the grouping of the original GRangeslist, but since the final GRangeslist only have 1 group, it does not work.

Of course you can do something like:

#df.gr: a global data.frame of all granges

GRangesList(lapply(df.gr$names, function(x) getGRLbyName(x)))

getGRLbyName = function(name){
  tempEquals = df.gr[df.gr$name == name] #find all in group by name
  a = GRanges( tempEquals )
  names(a) = rep(x,nrow(tempEquals)) # make names
  return(a)
}

But this is just bad..

Another artifact is that now the group will be added as a metacolumn in the final GRangesList.

How do you guys do this ?

Next thing is how GRanges accesses rows:

Usually this is done by name, i.g gr["ENST0009123123"], but what if I want to access by exon_id, which is a meta column ?

It looks like this is not possible ? and I therefor have to transform to data.frame, and I lose the grouping..

I feel like GRanges is not suited if you want to do big changes to specific data in it, because you can not do what you want, unless you transform to data.frame

GRangeslist GRanges makeGrangesFromDataFrame • 1.2k views

ADD COMMENT • link updated 6.5 years ago by Michael Lawrence ★ 11k • written 6.5 years ago by hauken_heyken ▴ 80

score 2 · Accepted Answer · 2017-10-26

2

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 2.4 years ago

United States

Information will be lost through coercion to a more general data structure. I guess there could be a way to coerce from data.frame to GRangesList, making an inference on the grouping column based on its name.

Typically though this round trip should not be necessary. GRanges shares much of the same functionality as data.frame. For example, to subset to a specific exon_id, you could do something like:

gr[gr$exon_id == "exon1"]

ADD COMMENT • link 6.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Yes, that is good, but since I need the grouping I need to do it on a GRangesList:

This does not work:

#grl: granges list grouped by transcript

grl[grl$exon_id == 1]

Output is:

GRangesList object of length 0:
<0 elements>

Only way I could get an output is something like this:

grl[names(unlist(grl)$exon_id == 1)]

But since all exons in each group have same name, it will only return the original list.

Any way around this without making granges or dataframe and preserve the grouping ?

ADD REPLY • link 6.5 years ago hauken_heyken ▴ 80

0

Entering edit mode

By looking through this post: A: Fast way to access metadata for GRangesList?

I found that this works, but not totaly what I want:

#grl: grangesList

#gr = unlist(grl) , granges object

regrl = relist(u,grl)

The groups are back, but they are not identical:

identical(regrl,grl)
[1] FALSE

This is good enough for me, so I consider the question solved

ADD REPLY • link 6.5 years ago hauken_heyken ▴ 80

0

Entering edit mode

tweak that so that unlist() has use.names=FALSE, otherwise the names of the unlisted object are those of the elements of the list, e.g.,

example(GRangesList)
identical(grl, relist(unlist(grl, use.names=FALSE), grl))    # TRUE

ADD REPLY • link 6.5 years ago Martin Morgan 25k

0

Entering edit mode

If you need the data as a GRangesList, then there is a choice: are you restricting the analysis to a single chromosome? If so, this does what you want:

keepSeqlevels(grl, "chr1", pruning.mode="tidy")

Or, be explicit:

grl1 <- grl[seqnames(grl) == "chr1"]
grl1[lengths(grl1) > 0]

ADD REPLY • link 6.5 years ago Michael Lawrence ★ 11k