I am experiencing multiple problems with the latest easyRNASeq package that make it hard to use with my current setup using the new function simpleRNASeq:
- BAM files aligned using STAR aligner
- 3 technical replicates merged into a single BAM file using samtools merge
- sorted and indexed using samtools 1.2 (old samtools output is not recognized as sorted)
- annotation from custom annotation file in gff3 format
Some problems might be related to this specific setup or could be ignored, but maybe not all.
- Strandedness validation of BAM file is suspicious, I know my libraries are strand specific, and this is the outcome for all files I checked
1: In FUN(X[[i]], ...) :
Bam file: 1-ch1_S60_L00X_R1_001.fastq.gz.bam.sorted.bam is considered unstranded.
2: In FUN(X[[i]], ...) :
Bam file: 1-ch1_S60_L00X_R1_001.fastq.gz.bam.sorted.bam Strandedness could not be determined using 8 regions spanning 19315 bp on either strand at a 90% cutoff; 12.5 percent appear to be stranded. - There is no return value due to the error
- BAM files sorted by older versions of samtools (0.1.18,0.1.19) are not recognized as being sorted "Your BAM file needs to be sorted by coordinate; see (R)samtools sort."
- Gff file validation asks specifically for type "mRNA" when choosing 'transcript' quantification, so one needs to change all transcript entries in gff, even though it is technically wrong for non-coding transcripts. This has worked differently before
- function easyRNASeq was deprecated but the tutorial still uses it.
- Earlier it was possible to get an edgeR or DEseq object directly, now it is a bit unclear to me how this would work
library(easyRNASeq) anns = "/export/kjempetujafs/licebase/genomedata/lsalmonis/LsalmRNA.gff3" args <- c('59-AK-chal1_S39_L00X_R1_001.fastq.gz.bam.sorted.bam') annotParam <- AnnotParam(datasource=anns, type="gff3") bamParam <- BamParam(paired = FALSE, stranded = TRUE, yieldSize = 1000000L) param <- RnaSeqParam(annotParam = annotParam, bamParam = bamParam, countBy = c("transcripts"), precision = "bp") ret <- simpleRNASeq(bamFiles = getBamFileList(args), param = param, nnodes = 40, verbose = TRUE, override = TRUE)
Output:
========================== simpleRNASeq version 2.4.2 ========================== Creating a SummarizedExperiment. ========================== Processing the alignments. ========================== Pre-processing 1 BAM files. Validating the BAM files. Extracted 36096 reference sequences information. Found 1 single-end BAM files. Found 0 paired-end BAM files. Bam file: 1-ch1_S60_L00X_R1_001.fastq.gz.bam.sorted.bam has reads of length 76bp Bam file: 1-ch1_S60_L00X_R1_001.fastq.gz.bam.sorted.bam has 23246292 reads. ========================== Processing the annotation ========================== Validating the annotation source Read 6 records Validated a datasource of type gff3 Fetching the annotation Retrieving annotation from a gff3 datasource Read 149878 records Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘splitAsList’ for signature ‘"GRanges", "NULL"’ In addition: Warning messages: 1: In FUN(X[[i]], ...) : Bam file: 1-ch1_S60_L00X_R1_001.fastq.gz.bam.sorted.bam is considered unstranded. 2: In FUN(X[[i]], ...) : Bam file: 1-ch1_S60_L00X_R1_001.fastq.gz.bam.sorted.bam Strandedness could not be determined using 8 regions spanning 19315 bp on either strand at a 90% cutoff; 12.5 percent appear to be stranded. 3: In simpleRNASeq(bamFiles = getBamFileList(args), param = param, : You have chosen to override the detected parameters. Hope you know what you are doing. Contact me if you think the parameter detection failed.
> traceback() 8: stop(gettextf("unable to find an inherited method for function %s for signature %s", sQuote(fdef@generic), sQuote(cnames)), domain = NA) 7: (function (classes, fdef, mtable) { methods <- .findInheritedMethods(classes, fdef, mtable) if (length(methods) == 1L) return(methods[[1L]]) else if (length(methods) == 0L) { cnames <- paste0("\"", vapply(classes, as.character, ""), "\"", collapse = ", ") stop(gettextf("unable to find an inherited method for function %s for signature %s", sQuote(fdef@generic), sQuote(cnames)), domain = NA) } else stop("Internal error in finding inherited methods; didn't return a unique method", domain = NA) })(list("GRanges", "NULL"), function (x, f, drop = FALSE, ...) standardGeneric("splitAsList"), <environment>) 6: splitAsList(x, f, drop = drop) 5: .local(x, f, drop, ...) 4: split(grngs, grngs$seqnames) 3: split(grngs, grngs$seqnames) 2: simpleRNASeq(bamFiles = getBamFileList(args), param = param, nnodes = 40, verbose = TRUE, override = TRUE) 1: simpleRNASeq(bamFiles = getBamFileList(args), param = param, nnodes = 40, verbose = TRUE, override = TRUE) >
> sessionInfo()R version 3.2.0 (2015-04-16)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] easyRNASeq_2.4.2
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-2 futile.logger_1.4.1 GenomeInfoDb_1.4.1
[4] XVector_0.8.0 futile.options_1.0.0 bitops_1.0-6
[7] zlibbioc_1.14.0 biomaRt_2.24.0 annotate_1.46.0
[10] RSQLite_1.0.0 lattice_0.20-31 DBI_0.3.1
[13] parallel_3.2.0 DESeq_1.20.0 genefilter_1.50.0
[16] hwriter_1.3.2 Biostrings_2.36.1 S4Vectors_0.6.1
[19] IRanges_2.2.5 stats4_3.2.0 locfit_1.5-9.1
[22] grid_3.2.0 LSD_3.0 Biobase_2.28.0
[25] AnnotationDbi_1.30.1 XML_3.98-1.3 survival_2.38-3
[28] BiocParallel_1.2.6 limma_3.24.12 latticeExtra_0.6-26
[31] geneplotter_1.46.0 lambda.r_1.1.7 edgeR_3.10.2
[34] Rsamtools_1.20.4 intervals_0.15.0 genomeIntervals_1.24.1
[37] GenomicAlignments_1.4.1 splines_3.2.0 BiocGenerics_0.14.0
[40] GenomicRanges_1.20.5 ShortRead_1.26.0 xtable_1.7-4
[43] RCurl_1.95-4.7
>
<font face="sans-serif, Arial, Verdana, Trebuchet MS">
</font>
head(assay(sexp))
ACACTG.bam ACTAGC.bam ATGGCT.bam gapped.bam subset.bam TTGCGA.bam
CG10035:1 38 26 33 0 38 39
CG10035:2 87 70 86 0 83 96
CG10090:1 0 0 0 0 0 0
CG10091:1 0 0 0 0 0 0
CG10091:2 1 2 1 0 1 0
CG10096:1 0 0 0 0 0 0
Does that look good?
There is only one thing I could think of that is different, I have mapped against the genome and the mitochondrial genome, but I don't have the mitochondrial genes in the Gff, so there is a reference in in the BAM files that doesn't have a landmark in the GFF, wasn't a problem before, but maybe now?