Problem with mcols in GenomicFeatures
2
0
Entering edit mode
Jake ▴ 90
@jake-7236
Last seen 19 months ago
United States

Hi,

I created a transcriptDb using the makeTranscriptDbFromGFF() command. I can extract all of the transcripts by gene name. I get a GRanges list with each gene and then 2 columns of metadata including the transcripts ID and transcript names associated with each gene. However, when I try to pull this information out with mcols, it is empty. I've included the code and output below. Am I doing something wrong?

Thanks

> transcript <- transcriptsBy(gencodeTxdb,by='gene')
> test <- transcript[2:3]
> test
GRangesList object of length 2:
$ENSMUSG00000000003.10 
GRanges object with 3 ranges and 2 metadata columns:
      seqnames               ranges strand |     tx_id               tx_name
         <Rle>            <IRanges>  <Rle> | <integer>           <character>
  [1]     chrX [77837901, 77853623]      - |     11687 ENSMUSG00000000003.10
  [2]     chrX [77837901, 77853623]      - |     11688  ENSMUST00000000003.8
  [3]     chrX [77837902, 77853530]      - |     11689  ENSMUST00000114041.2

$ENSMUSG00000000028.9 
GRanges object with 4 ranges and 2 metadata columns:
      seqnames               ranges strand | tx_id              tx_name
  [1]    chr16 [18780447, 18811972]      - | 16295 ENSMUST00000000028.8
  [2]    chr16 [18780447, 18811987]      - | 16296 ENSMUSG00000000028.9
  [3]    chr16 [18780453, 18811626]      - | 16297 ENSMUST00000096990.4
  [4]    chr16 [18807356, 18811987]      - | 16298 ENSMUST00000115585.1

-------
seqinfo: 22 sequences (1 circular) from an unspecified genome; no seqlengths
> mcols(test)
DataFrame with 2 rows and 0 columns

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GenomicAlignments_1.2.1 Rsamtools_1.18.2        Biostrings_2.34.1       XVector_0.6.0           BiocInstaller_1.16.1   
 [6] GenomicFeatures_1.18.3  AnnotationDbi_1.28.1    Biobase_2.26.0          GenomicRanges_1.18.4    GenomeInfoDb_1.2.4     
[11] IRanges_2.0.1           S4Vectors_0.4.0         BiocGenerics_0.12.1    

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2    BatchJobs_1.5      BBmisc_1.9         BiocParallel_1.0.3 biomaRt_2.22.0     bitops_1.0-6       brew_1.0-6        
 [8] checkmate_1.5.1    codetools_0.2-10   DBI_0.3.1          digest_0.6.8       fail_1.2           foreach_1.4.2      iterators_1.0.7   
[15] RCurl_1.95-4.5     RSQLite_1.0.0      rtracklayer_1.26.2 sendmailR_1.2-1    stringr_0.6.2      tools_3.1.2        XML_3.98-1.1      
[22] zlibbioc_1.12.0 

 

 

 

genomicfeatures • 989 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen just now
United States

The GRangesList is a list of GRanges, so you have to act accordingly:

> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> tx <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)
> mcols(tx[1])
DataFrame with 1 row and 0 columns

> mcols(tx[[1]])
DataFrame with 2 rows and 2 columns
      tx_id     tx_name
  <integer> <character>
1     70455  uc002qsd.4
2     70456  uc002qsf.2

> lapply(tx[1:3], mcols)
$`1`
DataFrame with 2 rows and 2 columns
      tx_id     tx_name
  <integer> <character>
1     70455  uc002qsd.4
2     70456  uc002qsf.2

$`10`
DataFrame with 1 row and 2 columns
      tx_id     tx_name
  <integer> <character>
1     31944  uc003wyw.1

$`100`
DataFrame with 1 row and 2 columns
      tx_id     tx_name
  <integer> <character>
1     72132  uc002xmj.3

 

ADD COMMENT
0
Entering edit mode

It's cheaper to unlist the GRangesList and then pull out the metadata 'all at once'

mcols(unlist(tx))

The unlisted DataFrame is a good starting point for many operations, e.g., adding additional columns. One can re-list the DataFrame flesh around the tx skeleton to recover the overall 'geometry'

> relist(mcols(unlist(tx)), tx)
SplitDataFrameList of length 23459
$`1`
DataFrame with 2 rows and 2 columns
      tx_id     tx_name
   
1     70455  uc002qsd.4
2     70456  uc002qsf.2

$`10`
DataFrame with 1 row and 2 columns
      tx_id     tx_name
   
1     31944  uc003wyw.1

$`100`
DataFrame with 1 row and 2 columns
      tx_id     tx_name
   
1     72132  uc002xmj.3

...
<23456 more elements>

 

ADD REPLY
0
Entering edit mode
Jake ▴ 90
@jake-7236
Last seen 19 months ago
United States

Awesome thanks

ADD COMMENT

Login before adding your answer.

Traffic: 835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6