Search
Question: Reorder exons on negative strand in GRanges list
0
gravatar for Jake
2.5 years ago by
Jake60
United States
Jake60 wrote:

I am trying to make a fasta file of reduced exon sequences (collapse all isoforms of a gene into one transcript). I'm running into a problem where exons on the negative strand are listed in ascending order based on chromosome coordinates, but this results in the exon sequences being written in the wrong order. I've tried several methods to re-order the negative strand elements, but they were all running very slow. What would be the best way to do this?

Thanks

library('BSgenome.Mmusculus.UCSC.mm10')
library('GenomicFeatures')
genome <- BSgenome.Mmusculus.UCSC.mm10
gencode <- makeTxDbFromGFF('~/Downloads/gencode.vM8.basic.annotation.gtf')
exonAll <- exonsBy(gencode, by='gene')
exonAll2 <- reduce(exonAll)
seqs <- extractTranscriptSeqs(genome, exonAll2)
writeXStringSet(seqs, '~/Desktop/basic.fa')
ADD COMMENTlink modified 2.5 years ago by Michael Lawrence10k • written 2.5 years ago by Jake60
2
gravatar for Michael Lawrence
2.5 years ago by
United States
Michael Lawrence10k wrote:

A case could be made that extractTranscriptSeqs() should take care of this, but for now you could reverse the elements of exonAll2 when the gene is on the negative strand, e.g.:

exonAll2 <- revElements(exonAll2, any(strand(exonAll2) == "-"))

 

ADD COMMENTlink written 2.5 years ago by Michael Lawrence10k

I'm confused. Why should extractTranscriptSeqs() take care of this? The reduce() call removes the exon_rank information. The resulting GRangesList after the reduce() call does even not contain actual transcript sequences for many genes.

ADD REPLYlink written 2.5 years ago by Michael Love20k

It's true that this operation probably does not make sense biologically, but my point about extractTranscriptSeqs() was that it could sort exons based on their strand, or at least throw an error if there is inconsistency. It's easy to get into that situation if not using a TxDb.

ADD REPLYlink written 2.5 years ago by Michael Lawrence10k
Ah, got it. Thanks.
ADD REPLYlink written 2.5 years ago by Michael Love20k

It appears the case was never made.  revElements certainly works as suggested, but it is dreadfully slow. Is there any other new BioC juju that might make this operation sing (and, would it help that in my case ALL all my exonAll2 have exactly two elements)?

 

ADD REPLYlink written 4 weeks ago by Malcolm Cook1.5k

I'm surprised that revElements() on a GRangesList is slow. It's implemented pretty efficiently. Do you have an example?

ADD REPLYlink written 4 weeks ago by Michael Lawrence10k

Here's timing produced by the five statements in the following function, to give you a sense of relative speed on my computer:

   user  system elapsed 
 12.227   0.871  13.100 
   user  system elapsed 
 41.688   3.735  45.423 
   user  system elapsed 
636.549  29.291 665.827 
   user  system elapsed 
  5.991   0.060   6.052 
   user  system elapsed 
164.827  62.318 227.457 

 

writeIntronFlankingSeqs<-function(genomeName='BSgenome.Hsapiens.UCSC.hg38'
          ,TxDbName='TxDb.Hsapiens.UCSC.hg38.knownGene'
          ,flankSize=25
          ,con=paste(sep='.','sjdb',flankSize,TxDbName,'fa')
           ) {
    ## PURPOSE: Write a fasta file holding <flankSize> bases flanking
    ## each unique intron known to <TxDbName>, as extracted from the
    ## BSgenome named <genomeName>.
    library(rtracklayer)
    library(GenomicFeatures)
    library(BSgenome)
    genome<-getBSgenome(genomeName)
    require(TxDbName,char=T)
    TxDb<-eval(parse(text=TxDbName))

    print(system.timeintron.gr<-unique(unlist(intronsByTranscript(TxDb)))))

    print(system.time(
      flanking.grl<-
        ## Kudos: C: Element-wise setdiff when psetdiff fails for GenomicRanges
        setdiff(as(trimintron.gr+flankSize),'GRangesList')
               ,asintron.gr,'GRangesList'))))

    stopifnot(all(2==lengths(flanking.grl)))

    print(system.time(
      flanking.grl<-
        ## Required for correct 5-3' order in subsequent call to
        ## extractTranscriptSeqs.  Kudos:
        ## [[Reorder exons on negative strand in GRanges list][Reorder exons on
        ## negative strand in GRanges list]] - along with appeal for
        ## faster approach (THIS IS SLOW).
        revElements(flanking.grl,any(strand(flanking.grl)=='-')) 
    ))

    print(system.time(
      names(flanking.grl)<-
        ## Creates names of the form:
        ## chr:start1-end1:strand^chr:start2-end2:strand
        do.call(function(...) paste(sep='^',...)
               ,as.data.frame(matrix(as.character(unlist(flanking.grl))
                                    ,ncol=2
                                    ,byrow=T)))
    ))

    print(system.time(
      writeXStringSet(extractTranscriptSeqs(genome,flanking.grl)
                     ,con)
    ))

  }

Thanks for taking a squiz Michael.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Malcolm Cook1.5k

I will look into the performance issue but I'm wondering whether it wouldn't be better to do this using flank(), which would get the upstream/downstream right based on the strand.

ADD REPLYlink written 28 days ago by Michael Lawrence10k

Turns out the issue is with the any(strand(flanking.grl)=='-')). There was a missing fast path for any() on a CompressedRleList. Try IRanges 2.15.19.

ADD REPLYlink written 28 days ago by Michael Lawrence10k

Mike - thanks for offering to take a look, and even more so for the "spot on" comment about using flank... you're absolutely right, computing flanking.grl as below is ~100x faster

 

        up.gr<-trim(flank(intron.gr,flankSize,T))
        down.gr<-trim(flank(intron.gr,flankSize,F))
        flanking.grl<-split(c(up.gr,down.gr),seq_along(up.gr))
        names(flanking.grl)<- paste(sep='^',as.character(up.gr),as.character(down.gr))

Any more idiomatic approaches to building that GRangesList are most welcome... I expect there is some unlist/relist juju I've never fully appreciated.

ADD REPLYlink modified 28 days ago • written 28 days ago by Malcolm Cook1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 438 users visited in the last hour