Question

simpleRNASeq easyRNAseq warnings edgeR error

0

Entering edit mode

KGies • 0

@kgies-14747

Last seen 7.9 years ago

Hi All,

I am relatively new to bulk RNA Seq workflow. I was given BAM files (For now I'm using one BAM file, I will make count tables of multiple BAM files once I am confident it is working correctly) and I am using the simpleRNASeq function in the easyRNASeq package to generate count tables of exons, genes, and transcripts. I get the following warnings when I run simpleRNASeq.

1: In FUN(X[[i]], ...) :
  At the moment, this function will improperly count Illumina stranded data, reporting reads that align to the opposite strand of the transcript/gene. Please use the param argument to force an unstranded analysis behaviour. Support for "reverse" strand specificity will be implemented ASAP
2: In simpleRNASeq(bamFiles = bamFiles, param = rnaSeqPramG,  ... :
  You provided an incorrect BAM parameter; 'stranded' should be set to 'FALSE'.
3: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': HG1007_PATCH, HG1032_PATCH, HG104_HG975_PATCH, HG1063_PATCH, HG1074_PATCH, HG1079_PATCH, HG1082_HG167_PATCH, HG1091_P[... truncated]

I actually have 12 warnings of "Each of the 2 combined objects has sequence levels not in the other". I'm not sure if they are significant. At first I ignored them, but am now running into errors down the road.

I run into an error when I convert the returned a RangedSummarizedExperiment object of gene or transcript counts from SimpleRNASeq to a DGEList class object using the DEformats package. But the RangedSummarizedExperiment object of exon counts is converted to a DGEList class object. Shouldn't the DGEList function only work for gene counts and not exon counts? I am using a synthetic transcript as my annotation file, so my count table dimensions (63677X1) for genes and transcripts is the same, and my count table of exons is different ( 353207 X1). Below is my code generating the gene count table and error for attempting to convert the resulting RangedSummarizedExperiment object into an DGEList object.

load("./0109Synth_annotParam_Hsapiens.GRCh37.75.rda")

load("./0110_bamFiles_r2370_l12_D708-D508=317-12_rep1-hsap.rda")

countBy = c("exons", "features", "genes",  "transcripts")


rnaSeqPramG<- RnaSeqParam(annotParam=annotParamSynth,

                          bamParam = BamParam(paired= FALSE, stranded= FALSE),

                          countBy = countBy[3])


Counts_synG <- simpleRNASeq(

  bamFiles=bamFiles,

  param= rnaSeqPramG,

  verbose = TRUE)

dgeG = DGEList(Counts_synG)
____

Error in DGEList(assay(counts), lib.size = lib.size, norm.factors = norm.factors,  : 
  Counts and genes have different numbers of rows

Is there something wrong with my gene count tables or is it possible to return a DGEList class object directly instead of a RangedSummarizedExperiment object using simpleRNASeq? I know it's possible using easyRNASeq function, but I would prefer to use simpleRNASeq over easyRNASeq, since easyRNASeq is deprecated.

Thanks for bearing with me on the lengthy post!

Best,

KG

Also:

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tibble_1.4.1                      biomaRt_2.34.1                    RCurl_1.95-4.10                  
 [4] bitops_1.0-6                      limma_3.34.5                      DESeq_1.30.0                     
 [7] lattice_0.20-35                   locfit_1.5-9.1                    Biobase_2.38.0                   
[10] Rsamtools_1.30.0                  BSgenome.Hsapiens.UCSC.hg38_1.4.1 BSgenome.Hsapiens.UCSC.hg19_1.4.0
[13] BSgenome_1.46.0                   rtracklayer_1.38.2                GenomicRanges_1.30.0             
[16] GenomeInfoDb_1.14.0               easyRNASeq_2.14.0                 digest_0.6.13                    
[19] Biostrings_2.46.0                 XVector_0.18.0                    IRanges_2.12.0                   
[22] S4Vectors_0.16.0                  BiocGenerics_0.24.0               DEFormats_1.6.0                  
[25] backports_1.1.2                  

loaded via a namespace (and not attached):
 [1] matrixStats_0.52.2         bit64_0.9-7                RColorBrewer_1.1-2        
 [4] progress_1.1.2             httr_1.3.1                 tools_3.4.3               
 [7] R6_2.2.2                   rpart_4.1-11               Hmisc_4.0-3               
[10] DBI_0.7                    lazyeval_0.2.1             colorspace_1.3-2          
[13] nnet_7.3-12                gridExtra_2.3              prettyunits_1.0.2         
[16] DESeq2_1.18.1              bit_1.1-12                 compiler_3.4.3            
[19] htmlTable_1.11.0           DelayedArray_0.4.1         scales_0.5.0              
[22] checkmate_1.8.5            genefilter_1.60.0          stringr_1.2.0             
[25] foreign_0.8-69             genomeIntervals_1.34.0     base64enc_0.1-3           
[28] pkgconfig_2.0.1            htmltools_0.3.6            htmlwidgets_0.9           
[31] rlang_0.1.4                rstudioapi_0.7             RSQLite_2.0               
[34] bindr_0.1                  hwriter_1.3.2              BiocParallel_1.12.0       
[37] acepack_1.4.1              dplyr_0.7.4                magrittr_1.5              
[40] GenomeInfoDbData_1.0.0     Formula_1.2-2              Matrix_1.2-12             
[43] Rcpp_0.12.14               munsell_0.4.3              stringi_1.1.6             
[46] edgeR_3.20.2               SummarizedExperiment_1.8.1 zlibbioc_1.24.0           
[49] plyr_1.8.4                 grid_3.4.3                 blob_1.1.0                
[52] splines_3.4.3              annotate_1.56.1            knitr_1.17                
[55] pillar_1.0.1               geneplotter_1.56.0         XML_3.98-1.9              
[58] glue_1.2.0                 ShortRead_1.36.0           latticeExtra_0.6-28       
[61] data.table_1.10.4-3        gtable_0.2.0               purrr_0.2.4               
[64] tidyr_0.7.2                assertthat_0.2.0           ggplot2_2.2.1             
[67] xtable_1.8-2               survival_2.41-3            intervals_0.15.1          
[70] GenomicAlignments_1.14.1   AnnotationDbi_1.40.0       memoise_1.1.0             
[73] bindrcpp_0.2               cluster_2.0.6              LSD_3.0

simplernaseq easyrnaseq edger DEformats • 1.7k views

ADD COMMENT • link updated 7.8 years ago by Nicolas Delhomme ▴ 320 • written 7.9 years ago by KGies • 0

0

Entering edit mode

Thank you for reporting this. I will look whether there is anything which could be done on the DEFormats side to address this. I will get back to you as soon as I know more.

ADD REPLY • link 7.9 years ago Andrzej Oleś ▴ 750

0

Entering edit mode

Thanks for your patience. In order to proceed, could you maybe provide a self-contained MWE illustrating the issue? Maybe it's enough to share the .rda files referenced in your code.

ADD REPLY • link 7.9 years ago Andrzej Oleś ▴ 750

score 0 · Answer 1 · 2018-02-16

Hej KGies! The warning about the non-common sequence levels can usually be safely ignored. It simply means that your BAM file and your annotation have sequences that are not in common, which most commonly happens 1) because 2 slightly different genome versions are used, 2) that an annotation/BAM file contains haplotypes that are absent from the other or 3) that different conventions have been used for naming the mitochondria (e.g. M vs. MT). Most of the times, the actual chromosomes share the same nomenclature. You can quickly check which sequence levels are different in R by extracting the seqnames from your annotParamSynth object and comparing them with those of one of your BamFile using e.g. the Rsamtools scanBamHeader function. As for the error, you can easily create an edgeR DGEList object as: library(edgeR) DGEList(assays(Counts_synG)$genes) Best, Nico --------------------------------------------------------------- Nicolas Delhomme, PhD UPSC bioinformatics facility Manager Umeå Plant Science Center, Swedish University for Agricultural Sciences (SLU) and Umeå University Tel: +46 90 786 5478 Email: nicolas.delhomme@umu.se SLU - Umeå universitet Umeå S-901 87 Sweden --------------------------------------------------------------- > On 10 Jan 2018, at 17:53, KGies [bioc] <noreply@bioconductor.org> wrote: > > Activity on a post you are following on support.bioconductor.org > User KGies wrote Question: simpleRNASeq easyRNAseq warnings edgeR error : > > > Hi All, > > I am relatively new to bulk RNA Seq workflow. I was given BAM files (For now I'm using one BAM file, I will make count tables of multiple BAM files once I am confident it is working correctly) and I am using the simpleRNASeq function in the easyRNASeq package to generate count tables of exons, genes, and transcripts. I get the following warnings when I run simpleRNASeq. > > 1: In FUN(X[[i]], ...) : > At the moment, this function will improperly count Illumina stranded data, reporting reads that align to the opposite strand of the transcript/gene. Please use the param argument to force an unstranded analysis behaviour. Support for "reverse" strand specificity will be implemented ASAP > 2: In simpleRNASeq(bamFiles = bamFiles, param = rnaSeqPramG, ... : > You provided an incorrect BAM parameter; 'stranded' should be set to 'FALSE'. > 3: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in the other: > - in 'x': HG1007_PATCH, HG1032_PATCH, HG104_HG975_PATCH, HG1063_PATCH, HG1074_PATCH, HG1079_PATCH, HG1082_HG167_PATCH, HG1091_P[... truncated] > > I actually have 12 warnings of "Each of the 2 combined objects has sequence levels not in the other". I'm not sure if they are significant. At first I ignored them, but am now running into errors down the road. > > I run into an error when I convert the returned a RangedSummarizedExperiment object of gene or transcript counts from SimpleRNASeq to a DGEList class object using the DEformats package. But the RangedSummarizedExperiment object of exon counts is converted to a DGEList class object. Shouldn't the DGEList function only work for gene counts and not exon counts? I am using a synthetic transcript as my annotation file, so my count table dimensions (63677X1) for genes and transcripts is the same, and my count table of exons is different ( 353207 X1). Below is my code generating the gene count table and error for attempting to convert the resulting RangedSummarizedExperiment object into an DGEList object. > > > load("./0109Synth_annotParam_Hsapiens.GRCh37.75.rda") > > load("./0110_bamFiles_r2370_l12_D708-D508=317-12_rep1-hsap.rda") > > countBy = c("exons", "features", "genes", "transcripts") > > > rnaSeqPramG<- RnaSeqParam(annotParam=annotParamSynth, > > bamParam = BamParam(paired= FALSE, stranded= FALSE), > > countBy = countBy[3]) > > > Counts_synG <- simpleRNASeq( > > bamFiles=bamFiles, > > param= rnaSeqPramG, > > verbose = TRUE) > > dgeG = DGEList(Counts_synG) > ____ > > Error in DGEList(assay(counts), lib.size = lib.size, norm.factors = norm.factors, : > Counts and genes have different numbers of rows > > Is there something wrong with my gene count tables or is it possible to return a DGEList class object directly instead of a RangedSummarizedExperiment object using simpleRNASeq? I know it's possible using easyRNASeq function, but I would prefer to use simpleRNASeq over easyRNASeq, since easyRNASeq is deprecated. > > Thanks for bearing with me on the lengthy post! > > Best, > > KG > > Also: > > sessionInfo() > R version 3.4.3 (2017-11-30) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > Matrix products: default > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats4 parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] tibble_1.4.1 biomaRt_2.34.1 RCurl_1.95-4.10 > [4] bitops_1.0-6 limma_3.34.5 DESeq_1.30.0 > [7] lattice_0.20-35 locfit_1.5-9.1 Biobase_2.38.0 > [10] Rsamtools_1.30.0 BSgenome.Hsapiens.UCSC.hg38_1.4.1 BSgenome.Hsapiens.UCSC.hg19_1.4.0 > [13] BSgenome_1.46.0 rtracklayer_1.38.2 GenomicRanges_1.30.0 > [16] GenomeInfoDb_1.14.0 easyRNASeq_2.14.0 digest_0.6.13 > [19] Biostrings_2.46.0 XVector_0.18.0 IRanges_2.12.0 > [22] S4Vectors_0.16.0 BiocGenerics_0.24.0 DEFormats_1.6.0 > [25] backports_1.1.2 > > loaded via a namespace (and not attached): > [1] matrixStats_0.52.2 bit64_0.9-7 RColorBrewer_1.1-2 > [4] progress_1.1.2 httr_1.3.1 tools_3.4.3 > [7] R6_2.2.2 rpart_4.1-11 Hmisc_4.0-3 > [10] DBI_0.7 lazyeval_0.2.1 colorspace_1.3-2 > [13] nnet_7.3-12 gridExtra_2.3 prettyunits_1.0.2 > [16] DESeq2_1.18.1 bit_1.1-12 compiler_3.4.3 > [19] htmlTable_1.11.0 DelayedArray_0.4.1 scales_0.5.0 > [22] checkmate_1.8.5 genefilter_1.60.0 stringr_1.2.0 > [25] foreign_0.8-69 genomeIntervals_1.34.0 base64enc_0.1-3 > [28] pkgconfig_2.0.1 htmltools_0.3.6 htmlwidgets_0.9 > [31] rlang_0.1.4 rstudioapi_0.7 RSQLite_2.0 > [34] bindr_0.1 hwriter_1.3.2 BiocParallel_1.12.0 > [37] acepack_1.4.1 dplyr_0.7.4 magrittr_1.5 > [40] GenomeInfoDbData_1.0.0 Formula_1.2-2 Matrix_1.2-12 > [43] Rcpp_0.12.14 munsell_0.4.3 stringi_1.1.6 > [46] edgeR_3.20.2 SummarizedExperiment_1.8.1 zlibbioc_1.24.0 > [49] plyr_1.8.4 grid_3.4.3 blob_1.1.0 > [52] splines_3.4.3 annotate_1.56.1 knitr_1.17 > [55] pillar_1.0.1 geneplotter_1.56.0 XML_3.98-1.9 > [58] glue_1.2.0 ShortRead_1.36.0 latticeExtra_0.6-28 > [61] data.table_1.10.4-3 gtable_0.2.0 purrr_0.2.4 > [64] tidyr_0.7.2 assertthat_0.2.0 ggplot2_2.2.1 > [67] xtable_1.8-2 survival_2.41-3 intervals_0.15.1 > [70] GenomicAlignments_1.14.1 AnnotationDbi_1.40.0 memoise_1.1.0 > [73] bindrcpp_0.2 cluster_2.0.6 LSD_3.0 > > > > > Post tags: simplernaseq, easyrnaseq, edger, DEformats > > You may reply via email or visit simpleRNASeq easyRNAseq warnings edgeR error >