subsetting a GRanges object with IntegerList and CharacterList metadata columns
1
0
Entering edit mode
efoss ▴ 10
@efoss-8908
Last seen 2.7 years ago
United States

I have a GRanges object called "MmOrganismDbTxs" that has a CharacterList of UCSC-style names as one of the metadata columns, and I have a vector of UCSC-style names that I'm interested in called "desiredUCnames". I would like to subset my GRanges object to pull out a GRanges object that contains only the genes I'm interested in. The line at the bottom of this code will do it, but it seems rather long and awkward. Is there a better way of subsetting?

library("Mus.musculus")

MmOrganismDb <- Mus.musculus

MmOrganismDbTxs <- transcripts(MmOrganismDb)

> head(MmOrganismDbTxs)
GRanges object with 6 ranges and 2 metadata columns:
      seqnames             ranges strand |          TXID          TXNAME
         <Rle>          <IRanges>  <Rle> | <IntegerList> <CharacterList>
  [1]     chr1 [4807893, 4842827]      + |             1      uc007afg.1
  [2]     chr1 [4807893, 4846735]      + |             2      uc007afh.1
  [3]     chr1 [4857694, 4897909]      + |             3      uc007afi.2
  [4]     chr1 [4857694, 4897909]      + |             4      uc011wht.1
  [5]     chr1 [4858328, 4897909]      + |             5      uc011whu.1
  [6]     chr1 [5083173, 5099777]      + |             6      uc007afm.1
  -------
  seqinfo: 66 sequences (1 circular) from mm10 genome

> desiredUCnames
[1] "uc007pac.1" "uc007puq.3" "uc009tzo.1" "uc009tzp.1" "uc007pur.1"

desiredTxs <- MmOrganismDbTxs[as.logical(elementMetadata(MmOrganismDbTxs)$TXNAME %in% desiredUCnames)]


Thank you. 

Eric

granges genomicranges • 1.4k views
ADD COMMENT
1
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States

Maybe a slight improvement:

desiredTxs <- subset(MmOrganismDbTxs, any(TXNAME %in% desiredUCnames))

ADD COMMENT
0
Entering edit mode

Hi Michael, 

That's a big improvement - thanks! One follow up question: In your "any" statement, it looks like you can use "TXNAME" without specifying that "TXNAME" is part of "MmOrganismDbTxs". Why is this? 

Thanks again for the help. 

Eric

ADD REPLY
0
Entering edit mode

The second argument to subset() is evaluated in the "context" of MmOrganismDbTxs. Just like the subset() in the base package does with data.frame. Note that there are some gotchas with lazy evaluation, but it is convenient for interactive/casual use. Passing strict=TRUE to subset() can help guard against some mistakes, but it requires a stricter syntax where "global" symbols like desiredUCnames escaped, as in:

desiredTxs <- subset(MmOrganismDbTxs, any(TXNAME %in% .(desiredUCnames)), strict=TRUE)

ADD REPLY
0
Entering edit mode

Thanks very much. I appreciate it. 

Eric

ADD REPLY

Login before adding your answer.

Traffic: 953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6