Hi there,
I'm comparing software for the detection of fusion genes; star, tophat-fusion and defuse.
I've read about the Bioconductor chimera package which would be a convenient way of evaluating the software on an equal footing as the outputs of each tool can be imported into and analysed by the chimera module.
I've been able to import defuse data with no issues but, so far, I've not had any luck with STAR (Chimeric.out.junction) (or Tophat). First it says there aren't enough supporting reads (even though it's the same sample that's been analysed as defuse) and when I lower the value to min.support=2 and run the import command it produced the following error message after about 20-30 mins:
tst<-importFusionData("star","Chimeric.out.no.MT",parallel=TRUE, org="hs", min.support=2) Error: 17688 errors; first error: Error in .getOneSeqFromBSgenomeMultipleSequences(x, name, start, NA, width, : sequence 1 not found
For more information, use bplasterror(). To resume calculation, re-call the function and set the argument 'BPRESUME' to TRUE or wrap the previous call in bpresume().
First traceback: 36: importFusionData("star", "Chimeric.out.no.MT", parallel = TRUE, org = "hs", min.support = 2) 35: .starImport(filename, ...) 34: bplapply(tmp.loc.counts, function(x, z, j, k) .starFset(x, z, j, k), z = grHs, j = chr.sym, k = org, BPPARAM = p) 33: bplapply(tmp.loc.counts, function(x, z, j, k) .starFset(x, z, j, k), z = grHs, j = chr.sym, k = org, BPPARAM = p) 32: mclapply(X = X, FUN = FUN, ..., mc.set.seed = BPPARAM$setSeed, mc.silent = !BPPARAM$verbose, mc.cores = bpworkers(BPPARAM), mc.cleanup = if (BPPARAM$cleanup) BPPARAM$cleanupSignal else FALSE) 31: lapply(seq_len(cores), inner.do) 30: lappl
The command I used was:
tst<-importFusionData("star","Chimeric.out.no.MT",parallel=TRUE, org="hs", min.support=2)
The initial STAR command used to generate the Chimeric.out.junction file was:
STAR --genomeDir ${REF_DIR}/star-index --readFilesIn /13617_3.no.multihit.filt.R2.fastq 13617_3.no.multihit.filt.R1.fastq --runThreadN 6 --chimSegmentMin 15 --chimJunctionOverhangMin 15
As I'm testing chimera for the first time, I'm assuming I'm diong something basic wrong but was wondering if you could provide any advice?
The Chimeric.output.junction file is very large, would you recommend some initial filtering or processing before attempting to import it?
Thanks
Angela
There are 2 differences between GRCh37 and hg19:
Cheers,
H.
One more thing: it is my understanding that
importFusionData()
usesgetSeq()
behind the scene to get the sequences from the hg19 genome. However,getSeq()
should have warned you that something is suspicious:library(BSgenome.Hsapiens.UCSC.hg19) genome <- BSgenome.Hsapiens.UCSC.hg19 getSeq(genome, GRanges("1", IRanges(11, 50))) # Error in .getOneSeqFromBSgenomeMultipleSequences(x, name, start, NA, width, : # sequence 1 not found # In addition: Warning message: # In .Seqinfo.mergexy(x, y) : # The 2 combined objects have no sequence levels in common. (Use # suppressWarnings() to suppress this warning.)
But you didn't get the warning so I'm not sure what's going on. Maybe you removed it from your post? Or your version of the BSgenome software package is old? (latest release version is 1.34.0). Make sure you're using BioC 3.0 (requires R 3.1.z) and that all your packages are up to date (call
biocLite()
with no args to check).H.