Question: Subsetting GRanges based on metadata
0
gravatar for mdeea123
3.3 years ago by
mdeea12310
mdeea12310 wrote:

I'm trying to subset a GRanges object to extract only the 1000 genes I'm interested in

I’ve just extracted the transcription start sites for mm10

head(tssgr)

GRanges object with 6 ranges and 1 metadata column:

            seqnames                 ranges strand |       GENEID

               <Rle>              <IRanges>  <Rle> | <FactorList>

  100009600     chr9 [ 21075496,  21075496]      - |    100009600

  100009609     chr7 [ 84964009,  84964009]      - |    100009609

  100009614    chr10 [ 77711446,  77711446]      + |    100009614

  100009664    chr11 [ 45808083,  45808083]      + |    100009664

     100012     chr4 [144162651, 144162651]      - |       100012

     100017     chr4 [134768004, 134768004]      - |       100017

Here is my gene list

> head(geneTable)
  SYMBOL GENEID
1   Aspn  66695
2 Angpt1  11600
3  Gm773 331416
4   Lifr  16880
5 Il1rl1  17082
6    Ogn  18295

I've tried

subt <- tssgr[mcols(tssgr)$GENEID %in% geneTable$GENEID]

but I get this error

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘NSBS’ for signature ‘"CompressedLogicalList"’

 

What am I doing wrong??

Here is the traceback:

8: stop(gettextf("unable to find an inherited method for function %s for signature %s", 
       sQuote(fdef@generic), sQuote(cnames)), domain = NA)
7: (function (classes, fdef, mtable) 
   {
       methods <- .findInheritedMethods(classes, fdef, mtable)
       if (length(methods) == 1L) 
           return(methods[[1L]])
       else if (length(methods) == 0L) {
           cnames <- paste0("\"", vapply(classes, as.character, 
               ""), "\"", collapse = ", ")
           stop(gettextf("unable to find an inherited method for function %s for signature %s", 
               sQuote(fdef@generic), sQuote(cnames)), domain = NA)
       }
       else stop("Internal error in finding inherited methods; didn't return a unique method", 
           domain = NA)
   })(list("CompressedLogicalList"), function (i, x, exact = TRUE, 
       upperBoundIsStrict = TRUE) 
   standardGeneric("NSBS"), <environment>)
6: NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append)
5: normalizeSingleBracketSubscript(i, x)
4: extractROWS(x, i)
3: extractROWS(x, i)
2: tssgr[mcols(tssgr)$GENEID %in% geneTable$GENEID]
1: tssgr[mcols(tssgr)$GENEID %in% geneTable$GENEID]

Mitchell

R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Mus.musculus_1.3.1                       TxDb.Mmusculus.UCSC.mm10.knownGene_3.2.2 org.Mm.eg.db_3.3.0                      
 [4] GO.db_3.3.0                              OrganismDbi_1.14.1                       GenomicFeatures_1.24.5                  
 [7] AnnotationDbi_1.34.4                     BSgenome.Mmusculus.UCSC.mm10_1.4.0       BSgenome_1.40.1                         
[10] rtracklayer_1.32.2                       Biostrings_2.40.2                        XVector_0.12.1                          
[13] GenomicRanges_1.24.2                     GenomeInfoDb_1.8.3                       IRanges_2.6.1                           
[16] S4Vectors_0.10.2                         BiocInstaller_1.22.3                     Biobase_2.32.0                          
[19] BiocGenerics_0.18.0                     

loaded via a namespace (and not attached):
 [1] graph_1.50.0               zlibbioc_1.18.0            GenomicAlignments_1.8.4    BiocParallel_1.6.5        
 [5] tools_3.3.0                SummarizedExperiment_1.2.3 DBI_0.5                    RBGL_1.48.1               
 [9] bitops_1.0-6               biomaRt_2.28.0             RCurl_1.95-4.8             RSQLite_1.0.0             
[13] Rsamtools_1.24.0           XML_3.98-1.4              
granges metadata subsetting • 864 views
ADD COMMENTlink modified 3.3 years ago by Michael Lawrence11k • written 3.3 years ago by mdeea12310
Answer: Subsetting GRanges based on metadata
1
gravatar for Michael Lawrence
3.3 years ago by
United States
Michael Lawrence11k wrote:

This is because there could in principle be multiple genes for a given transcript, so as you can see, you have a FactorList instead of a factor or ordinary vector for your gene IDs. You could attempt to drop the FactorList to a factor/vector, assuming there are no one-to-many relationships.

tssgr$GENEID <- drop(tssgr$GENEID)

Alternatively, you could select a TSS if any of its genes match:

subt <- tssgr[any(mcols(tssgr)$GENEID %in% geneTable$GENEID)]

Note that there are annotation sources that will give you the gene symbols without any extra work, e.g.:

tss <- resize(transcripts(Homo.sapiens, columns="SYMBOL"), 1L)

 

ADD COMMENTlink written 3.3 years ago by Michael Lawrence11k

Thanks heaps. 

Mitchell

ADD REPLYlink written 3.3 years ago by mdeea12310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 327 users visited in the last hour