Hi,
I have a 1x50nt illumina RNASeq data set that was only intended for gene level expression analysis. I was asked to look for fusion transcripts even though I emphasized that the sensitivity for finding such fusions would be lousy. I decided to start with a STAR (alignment to mm9) -> chimera workflow. Each sample reports a small number of fusion reads with only a few fusions that have 5 or more supporting reads as determined by awk. My guess is that they are all false positives. One example from a out.junction file (full file: https://s3.amazonaws.com/idata.drgang.net/temp/Chimeric.out.junction)
chr3 138267132 + chr2 181382061 + 0 0 0 DFXGT8Q1:294:C5A6EACXX:8:1105:20793:74047 138267108 24M26S 181382062 24S26M
chimera::importFusionData("star", "path/to/file", org = "mm", min.support = 1)
returns NULL and complains:
The input file does not have any spanning read.
Your fusion lacking of spanning reads are most probably artifacts
The analysis of fusions lacking spanning reads is not supported.
I'm new to fusion transcript detection, so this is a stupid question, but the read above to me seems to be a spanning read, right? So what is wrong with what I'm doing?
Thanks in advance for any help
Wolfgang
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] chimera_1.8.4
[2] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0
[3] GenomicFeatures_1.18.2
[4] BSgenome.Hsapiens.UCSC.hg19_1.4.0
[5] BSgenome_1.34.0
[6] rtracklayer_1.26.2
[7] org.Hs.eg.db_3.0.0
[8] RSQLite_1.0.0
[9] DBI_0.3.1
[10] AnnotationDbi_1.28.1
[11] GenomicAlignments_1.2.1
[12] Rsamtools_1.18.2
[13] Biostrings_2.34.0
[14] XVector_0.6.0
[15] GenomicRanges_1.18.3
[16] GenomeInfoDb_1.2.3
[17] IRanges_2.0.0
[18] S4Vectors_0.4.0
[19] Biobase_2.26.0
[20] BiocGenerics_0.12.1
loaded via a namespace (and not attached):
[1] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8 BiocParallel_1.0.0
[5] biomaRt_2.22.0 bitops_1.0-6 brew_1.0-6 checkmate_1.5.0
[9] codetools_0.2-9 digest_0.6.4 fail_1.2 foreach_1.4.2
[13] iterators_1.0.7 RCurl_1.95-4.5 sendmailR_1.2-1 stringr_0.6.2
[17] tools_3.1.1 XML_3.98-1.1 zlibbioc_1.12.0
Hi Raffaele,
great. Thanks for the fast reply and the upcoming fix.
Wolfgang