Question: simpleRNASeq easyRNAseq warnings edgeR error
0
gravatar for KGies
21 months ago by
KGies0
KGies0 wrote:

Hi All,

I am relatively new to bulk RNA Seq workflow.  I was given BAM files  (For now I'm using one BAM file, I will make count tables of multiple BAM files once I am confident it is working correctly) and I am using the simpleRNASeq function in the easyRNASeq package to generate count tables of exons, genes, and transcripts.  I get the following warnings when I run simpleRNASeq.

1: In FUN(X[[i]], ...) :
  At the moment, this function will improperly count Illumina stranded data, reporting reads that align to the opposite strand of the transcript/gene. Please use the param argument to force an unstranded analysis behaviour. Support for "reverse" strand specificity will be implemented ASAP
2: In simpleRNASeq(bamFiles = bamFiles, param = rnaSeqPramG,  ... :
  You provided an incorrect BAM parameter; 'stranded' should be set to 'FALSE'.
3: In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': HG1007_PATCH, HG1032_PATCH, HG104_HG975_PATCH, HG1063_PATCH, HG1074_PATCH, HG1079_PATCH, HG1082_HG167_PATCH, HG1091_P[... truncated]

I actually have 12 warnings of "Each of the 2 combined objects has sequence levels not in the other".  I'm not sure if they are significant.  At first I ignored them, but am now running into errors down the road.   


I run into an error when I convert the returned a RangedSummarizedExperiment object of gene or transcript counts from SimpleRNASeq to a DGEList class object using the DEformats  package.  But the RangedSummarizedExperiment object of exon counts is converted to a DGEList class object. Shouldn't the DGEList function only work for gene counts and not exon counts? I am using a synthetic transcript as my annotation file, so my count table dimensions (63677X1) for genes and transcripts is the same, and my count table of exons is different ( 353207 X1).  Below is my code generating the gene count table and error for attempting to  convert the resulting RangedSummarizedExperiment object into an DGEList object.

 

load("./0109Synth_annotParam_Hsapiens.GRCh37.75.rda")

load("./0110_bamFiles_r2370_l12_D708-D508=317-12_rep1-hsap.rda")

countBy = c("exons", "features", "genes",  "transcripts")


rnaSeqPramG<- RnaSeqParam(annotParam=annotParamSynth,

                          bamParam = BamParam(paired= FALSE, stranded= FALSE),

                          countBy = countBy[3])


Counts_synG <- simpleRNASeq(

  bamFiles=bamFiles,

  param= rnaSeqPramG,

  verbose = TRUE)

dgeG = DGEList(Counts_synG)
____ 
Error in DGEList(assay(counts), lib.size = lib.size, norm.factors = norm.factors,  : 
  Counts and genes have different numbers of rows

Is there something wrong with my gene count tables or is it possible to return a DGEList class object directly instead of a RangedSummarizedExperiment object using simpleRNASeq? I know it's possible using easyRNASeq function, but I would prefer to use simpleRNASeq over easyRNASeq, since easyRNASeq is deprecated.   

Thanks for bearing with me on the lengthy post!  

Best,

KG

Also:

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tibble_1.4.1                      biomaRt_2.34.1                    RCurl_1.95-4.10                  
 [4] bitops_1.0-6                      limma_3.34.5                      DESeq_1.30.0                     
 [7] lattice_0.20-35                   locfit_1.5-9.1                    Biobase_2.38.0                   
[10] Rsamtools_1.30.0                  BSgenome.Hsapiens.UCSC.hg38_1.4.1 BSgenome.Hsapiens.UCSC.hg19_1.4.0
[13] BSgenome_1.46.0                   rtracklayer_1.38.2                GenomicRanges_1.30.0             
[16] GenomeInfoDb_1.14.0               easyRNASeq_2.14.0                 digest_0.6.13                    
[19] Biostrings_2.46.0                 XVector_0.18.0                    IRanges_2.12.0                   
[22] S4Vectors_0.16.0                  BiocGenerics_0.24.0               DEFormats_1.6.0                  
[25] backports_1.1.2                  

loaded via a namespace (and not attached):
 [1] matrixStats_0.52.2         bit64_0.9-7                RColorBrewer_1.1-2        
 [4] progress_1.1.2             httr_1.3.1                 tools_3.4.3               
 [7] R6_2.2.2                   rpart_4.1-11               Hmisc_4.0-3               
[10] DBI_0.7                    lazyeval_0.2.1             colorspace_1.3-2          
[13] nnet_7.3-12                gridExtra_2.3              prettyunits_1.0.2         
[16] DESeq2_1.18.1              bit_1.1-12                 compiler_3.4.3            
[19] htmlTable_1.11.0           DelayedArray_0.4.1         scales_0.5.0              
[22] checkmate_1.8.5            genefilter_1.60.0          stringr_1.2.0             
[25] foreign_0.8-69             genomeIntervals_1.34.0     base64enc_0.1-3           
[28] pkgconfig_2.0.1            htmltools_0.3.6            htmlwidgets_0.9           
[31] rlang_0.1.4                rstudioapi_0.7             RSQLite_2.0               
[34] bindr_0.1                  hwriter_1.3.2              BiocParallel_1.12.0       
[37] acepack_1.4.1              dplyr_0.7.4                magrittr_1.5              
[40] GenomeInfoDbData_1.0.0     Formula_1.2-2              Matrix_1.2-12             
[43] Rcpp_0.12.14               munsell_0.4.3              stringi_1.1.6             
[46] edgeR_3.20.2               SummarizedExperiment_1.8.1 zlibbioc_1.24.0           
[49] plyr_1.8.4                 grid_3.4.3                 blob_1.1.0                
[52] splines_3.4.3              annotate_1.56.1            knitr_1.17                
[55] pillar_1.0.1               geneplotter_1.56.0         XML_3.98-1.9              
[58] glue_1.2.0                 ShortRead_1.36.0           latticeExtra_0.6-28       
[61] data.table_1.10.4-3        gtable_0.2.0               purrr_0.2.4               
[64] tidyr_0.7.2                assertthat_0.2.0           ggplot2_2.2.1             
[67] xtable_1.8-2               survival_2.41-3            intervals_0.15.1          
[70] GenomicAlignments_1.14.1   AnnotationDbi_1.40.0       memoise_1.1.0             
[73] bindrcpp_0.2               cluster_2.0.6              LSD_3.0                   
 

 

ADD COMMENTlink modified 20 months ago by Nicolas Delhomme320 • written 21 months ago by KGies0

Thank you for reporting this. I will look whether there is anything which could be done on the DEFormats side to address this. I will get back to you as soon as I know more.

ADD REPLYlink written 21 months ago by Andrzej Oleś720

Thanks for your patience. In order to proceed, could you maybe provide a self-contained MWE illustrating the issue? Maybe it's enough to share the .rda files referenced in your code.

ADD REPLYlink written 21 months ago by Andrzej Oleś720
Answer: simpleRNASeq easyRNAseq warnings edgeR error
0
gravatar for Nicolas Delhomme
20 months ago by
Sweden
Nicolas Delhomme320 wrote:
Hej KGies! The warning about the non-common sequence levels can usually be safely ignored. It simply means that your BAM file and your annotation have sequences that are not in common, which most commonly happens 1) because 2 slightly different genome versions are used, 2) that an annotation/BAM file contains haplotypes that are absent from the other or 3) that different conventions have been used for naming the mitochondria (e.g. M vs. MT). Most of the times, the actual chromosomes share the same nomenclature. You can quickly check which sequence levels are different in R by extracting the seqnames from your annotParamSynth object and comparing them with those of one of your BamFile using e.g. the Rsamtools scanBamHeader function. As for the error, you can easily create an edgeR DGEList object as: library(edgeR) DGEList(assays(Counts_synG)$genes) Best, Nico --------------------------------------------------------------- Nicolas Delhomme, PhD UPSC bioinformatics facility Manager Umeå Plant Science Center, Swedish University for Agricultural Sciences (SLU) and Umeå University Tel: +46 90 786 5478 Email: nicolas.delhomme@umu.se SLU - Umeå universitet Umeå S-901 87 Sweden --------------------------------------------------------------- > On 10 Jan 2018, at 17:53, KGies [bioc] <noreply@bioconductor.org> wrote: > > Activity on a post you are following on support.bioconductor.org > User KGies wrote Question: simpleRNASeq easyRNAseq warnings edgeR error : > > > Hi All, > > I am relatively new to bulk RNA Seq workflow. I was given BAM files (For now I'm using one BAM file, I will make count tables of multiple BAM files once I am confident it is working correctly) and I am using the simpleRNASeq function in the easyRNASeq package to generate count tables of exons, genes, and transcripts. I get the following warnings when I run simpleRNASeq. > > 1: In FUN(X[[i]], ...) : > At the moment, this function will improperly count Illumina stranded data, reporting reads that align to the opposite strand of the transcript/gene. Please use the param argument to force an unstranded analysis behaviour. Support for "reverse" strand specificity will be implemented ASAP > 2: In simpleRNASeq(bamFiles = bamFiles, param = rnaSeqPramG, ... : > You provided an incorrect BAM parameter; 'stranded' should be set to 'FALSE'. > 3: In .Seqinfo.mergexy(x, y) : > Each of the 2 combined objects has sequence levels not in the other: > - in 'x': HG1007_PATCH, HG1032_PATCH, HG104_HG975_PATCH, HG1063_PATCH, HG1074_PATCH, HG1079_PATCH, HG1082_HG167_PATCH, HG1091_P[... truncated] > > I actually have 12 warnings of "Each of the 2 combined objects has sequence levels not in the other". I'm not sure if they are significant. At first I ignored them, but am now running into errors down the road. > > I run into an error when I convert the returned a RangedSummarizedExperiment object of gene or transcript counts from SimpleRNASeq to a DGEList class object using the DEformats package. But the RangedSummarizedExperiment object of exon counts is converted to a DGEList class object. Shouldn't the DGEList function only work for gene counts and not exon counts? I am using a synthetic transcript as my annotation file, so my count table dimensions (63677X1) for genes and transcripts is the same, and my count table of exons is different ( 353207 X1). Below is my code generating the gene count table and error for attempting to convert the resulting RangedSummarizedExperiment object into an DGEList object. > > > load("./0109Synth_annotParam_Hsapiens.GRCh37.75.rda") > > load("./0110_bamFiles_r2370_l12_D708-D508=317-12_rep1-hsap.rda") > > countBy = c("exons", "features", "genes", "transcripts") > > > rnaSeqPramG<- RnaSeqParam(annotParam=annotParamSynth, > > bamParam = BamParam(paired= FALSE, stranded= FALSE), > > countBy = countBy[3]) > > > Counts_synG <- simpleRNASeq( > > bamFiles=bamFiles, > > param= rnaSeqPramG, > > verbose = TRUE) > > dgeG = DGEList(Counts_synG) > ____ > > Error in DGEList(assay(counts), lib.size = lib.size, norm.factors = norm.factors, : > Counts and genes have different numbers of rows > > Is there something wrong with my gene count tables or is it possible to return a DGEList class object directly instead of a RangedSummarizedExperiment object using simpleRNASeq? I know it's possible using easyRNASeq function, but I would prefer to use simpleRNASeq over easyRNASeq, since easyRNASeq is deprecated. > > Thanks for bearing with me on the lengthy post! > > Best, > > KG > > Also: > > sessionInfo() > R version 3.4.3 (2017-11-30) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > Matrix products: default > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats4 parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] tibble_1.4.1 biomaRt_2.34.1 RCurl_1.95-4.10 > [4] bitops_1.0-6 limma_3.34.5 DESeq_1.30.0 > [7] lattice_0.20-35 locfit_1.5-9.1 Biobase_2.38.0 > [10] Rsamtools_1.30.0 BSgenome.Hsapiens.UCSC.hg38_1.4.1 BSgenome.Hsapiens.UCSC.hg19_1.4.0 > [13] BSgenome_1.46.0 rtracklayer_1.38.2 GenomicRanges_1.30.0 > [16] GenomeInfoDb_1.14.0 easyRNASeq_2.14.0 digest_0.6.13 > [19] Biostrings_2.46.0 XVector_0.18.0 IRanges_2.12.0 > [22] S4Vectors_0.16.0 BiocGenerics_0.24.0 DEFormats_1.6.0 > [25] backports_1.1.2 > > loaded via a namespace (and not attached): > [1] matrixStats_0.52.2 bit64_0.9-7 RColorBrewer_1.1-2 > [4] progress_1.1.2 httr_1.3.1 tools_3.4.3 > [7] R6_2.2.2 rpart_4.1-11 Hmisc_4.0-3 > [10] DBI_0.7 lazyeval_0.2.1 colorspace_1.3-2 > [13] nnet_7.3-12 gridExtra_2.3 prettyunits_1.0.2 > [16] DESeq2_1.18.1 bit_1.1-12 compiler_3.4.3 > [19] htmlTable_1.11.0 DelayedArray_0.4.1 scales_0.5.0 > [22] checkmate_1.8.5 genefilter_1.60.0 stringr_1.2.0 > [25] foreign_0.8-69 genomeIntervals_1.34.0 base64enc_0.1-3 > [28] pkgconfig_2.0.1 htmltools_0.3.6 htmlwidgets_0.9 > [31] rlang_0.1.4 rstudioapi_0.7 RSQLite_2.0 > [34] bindr_0.1 hwriter_1.3.2 BiocParallel_1.12.0 > [37] acepack_1.4.1 dplyr_0.7.4 magrittr_1.5 > [40] GenomeInfoDbData_1.0.0 Formula_1.2-2 Matrix_1.2-12 > [43] Rcpp_0.12.14 munsell_0.4.3 stringi_1.1.6 > [46] edgeR_3.20.2 SummarizedExperiment_1.8.1 zlibbioc_1.24.0 > [49] plyr_1.8.4 grid_3.4.3 blob_1.1.0 > [52] splines_3.4.3 annotate_1.56.1 knitr_1.17 > [55] pillar_1.0.1 geneplotter_1.56.0 XML_3.98-1.9 > [58] glue_1.2.0 ShortRead_1.36.0 latticeExtra_0.6-28 > [61] data.table_1.10.4-3 gtable_0.2.0 purrr_0.2.4 > [64] tidyr_0.7.2 assertthat_0.2.0 ggplot2_2.2.1 > [67] xtable_1.8-2 survival_2.41-3 intervals_0.15.1 > [70] GenomicAlignments_1.14.1 AnnotationDbi_1.40.0 memoise_1.1.0 > [73] bindrcpp_0.2 cluster_2.0.6 LSD_3.0 > > > > > Post tags: simplernaseq, easyrnaseq, edger, DEformats > > You may reply via email or visit simpleRNASeq easyRNAseq warnings edgeR error >
ADD COMMENTlink written 20 months ago by Nicolas Delhomme320
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 294 users visited in the last hour