Hi All,
I am relatively new to bulk RNA Seq workflow. I was given BAM files (For now I'm using one BAM file, I will make count tables of multiple BAM files once I am confident it is working correctly) and I am using the simpleRNASeq function in the easyRNASeq package to generate count tables of exons, genes, and transcripts. I get the following warnings when I run simpleRNASeq.
1: In FUN(X[[i]], ...) :
At the moment, this function will improperly count Illumina stranded data, reporting reads that align to the opposite strand of the transcript/gene. Please use the param argument to force an unstranded analysis behaviour. Support for "reverse" strand specificity will be implemented ASAP
2: In simpleRNASeq(bamFiles = bamFiles, param = rnaSeqPramG, ... :
You provided an incorrect BAM parameter; 'stranded' should be set to 'FALSE'.
3: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': HG1007_PATCH, HG1032_PATCH, HG104_HG975_PATCH, HG1063_PATCH, HG1074_PATCH, HG1079_PATCH, HG1082_HG167_PATCH, HG1091_P[... truncated]
I actually have 12 warnings of "Each of the 2 combined objects has sequence levels not in the other". I'm not sure if they are significant. At first I ignored them, but am now running into errors down the road.
I run into an error when I convert the returned a RangedSummarizedExperiment object of gene or transcript counts from SimpleRNASeq to a DGEList class object using the DEformats package. But the RangedSummarizedExperiment object of exon counts is converted to a DGEList class object. Shouldn't the DGEList function only work for gene counts and not exon counts? I am using a synthetic transcript as my annotation file, so my count table dimensions (63677X1) for genes and transcripts is the same, and my count table of exons is different ( 353207 X1). Below is my code generating the gene count table and error for attempting to convert the resulting RangedSummarizedExperiment object into an DGEList object.
load("./0109Synth_annotParam_Hsapiens.GRCh37.75.rda")
load("./0110_bamFiles_r2370_l12_D708-D508=317-12_rep1-hsap.rda")
countBy = c("exons", "features", "genes", "transcripts")
rnaSeqPramG<- RnaSeqParam(annotParam=annotParamSynth,
bamParam = BamParam(paired= FALSE, stranded= FALSE),
countBy = countBy[3])
Counts_synG <- simpleRNASeq(
bamFiles=bamFiles,
param= rnaSeqPramG,
verbose = TRUE)
dgeG = DGEList(Counts_synG)
____
Error in DGEList(assay(counts), lib.size = lib.size, norm.factors = norm.factors, :
Counts and genes have different numbers of rows
Is there something wrong with my gene count tables or is it possible to return a DGEList class object directly instead of a RangedSummarizedExperiment object using simpleRNASeq? I know it's possible using easyRNASeq function, but I would prefer to use simpleRNASeq over easyRNASeq, since easyRNASeq is deprecated.
Thanks for bearing with me on the lengthy post!
Best,
KG
Also:
sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] tibble_1.4.1 biomaRt_2.34.1 RCurl_1.95-4.10 [4] bitops_1.0-6 limma_3.34.5 DESeq_1.30.0 [7] lattice_0.20-35 locfit_1.5-9.1 Biobase_2.38.0 [10] Rsamtools_1.30.0 BSgenome.Hsapiens.UCSC.hg38_1.4.1 BSgenome.Hsapiens.UCSC.hg19_1.4.0 [13] BSgenome_1.46.0 rtracklayer_1.38.2 GenomicRanges_1.30.0 [16] GenomeInfoDb_1.14.0 easyRNASeq_2.14.0 digest_0.6.13 [19] Biostrings_2.46.0 XVector_0.18.0 IRanges_2.12.0 [22] S4Vectors_0.16.0 BiocGenerics_0.24.0 DEFormats_1.6.0 [25] backports_1.1.2 loaded via a namespace (and not attached): [1] matrixStats_0.52.2 bit64_0.9-7 RColorBrewer_1.1-2 [4] progress_1.1.2 httr_1.3.1 tools_3.4.3 [7] R6_2.2.2 rpart_4.1-11 Hmisc_4.0-3 [10] DBI_0.7 lazyeval_0.2.1 colorspace_1.3-2 [13] nnet_7.3-12 gridExtra_2.3 prettyunits_1.0.2 [16] DESeq2_1.18.1 bit_1.1-12 compiler_3.4.3 [19] htmlTable_1.11.0 DelayedArray_0.4.1 scales_0.5.0 [22] checkmate_1.8.5 genefilter_1.60.0 stringr_1.2.0 [25] foreign_0.8-69 genomeIntervals_1.34.0 base64enc_0.1-3 [28] pkgconfig_2.0.1 htmltools_0.3.6 htmlwidgets_0.9 [31] rlang_0.1.4 rstudioapi_0.7 RSQLite_2.0 [34] bindr_0.1 hwriter_1.3.2 BiocParallel_1.12.0 [37] acepack_1.4.1 dplyr_0.7.4 magrittr_1.5 [40] GenomeInfoDbData_1.0.0 Formula_1.2-2 Matrix_1.2-12 [43] Rcpp_0.12.14 munsell_0.4.3 stringi_1.1.6 [46] edgeR_3.20.2 SummarizedExperiment_1.8.1 zlibbioc_1.24.0 [49] plyr_1.8.4 grid_3.4.3 blob_1.1.0 [52] splines_3.4.3 annotate_1.56.1 knitr_1.17 [55] pillar_1.0.1 geneplotter_1.56.0 XML_3.98-1.9 [58] glue_1.2.0 ShortRead_1.36.0 latticeExtra_0.6-28 [61] data.table_1.10.4-3 gtable_0.2.0 purrr_0.2.4 [64] tidyr_0.7.2 assertthat_0.2.0 ggplot2_2.2.1 [67] xtable_1.8-2 survival_2.41-3 intervals_0.15.1 [70] GenomicAlignments_1.14.1 AnnotationDbi_1.40.0 memoise_1.1.0 [73] bindrcpp_0.2 cluster_2.0.6 LSD_3.0 |
|
|
Thank you for reporting this. I will look whether there is anything which could be done on the DEFormats side to address this. I will get back to you as soon as I know more.
Thanks for your patience. In order to proceed, could you maybe provide a self-contained MWE illustrating the issue? Maybe it's enough to share the .rda files referenced in your code.