Question: CAGEr : problem while filtering the bamFile
2
gravatar for Vivek.b
3.3 years ago by
Vivek.b100
Germany
Vivek.b100 wrote:

Dear CAGEr team

I have been trying to run CAGEr on my test dataset and having problem in reading in the BAM files to make CAGEset object.

> input = list.files("test",pattern = "test-.*sorted.bam$",full.names = TRUE)
> mycage <- new("CAGEset", genomeName = "BSgenome.Dmelanogaster.UCSC.dm6",
+                 inputFiles = input, inputFilesType = "bam",
+                 sampleLabels = c("test-C", "test-D", "test-E", "test-G"))
> ctss <- getCTSS(mycage,removeFirstG = TRUE,correctSystematicG = TRUE)

Gives me :

Reading in file: test/test-C_sorted.bam...
	-> Filtering out low quality reads...
	-> Removing the first base of the reads if 'G' and not aligned to the genome...
	-> Estimating the frequency of adding a 'G' nucleotide and correcting the systematic bias...
Error in setnames(CTSS, c("chr", "pos", "strand", sample.labels[i])) : 
  Can't assign 4 names to a 1 column data.table

After looking into the code I thought I should try not using the removeFirstG option.. so I tried :

ctss <- getCTSS(mycage,removeFirstG = FALSE,correctSystematicG = FALSE)

and got :

Reading in file: test/test-C_sorted.bam...
	-> Filtering out low quality reads...
Error in `$<-.data.frame`(`*tmp*`, "tag_count", value = 1) : 
  replacement has 1 row, data has 0

So basically got stuck at the same place in the source code, but with a different error.

If I turn only correctSystematicG false, i get the same error :

ctss <- getCTSS(mycage, removeFirstG = TRUE, correctSystematicG = FALSE)

Reading in file: ../rawdata_results/fetish-C_sorted.bam...
	-> Filtering out low quality reads...
	-> Removing the first base of the reads if 'G' and not aligned to the genome...
Error in `$<-.data.frame`(`*tmp*`, "tag_count", value = 1) : 
  replacement has 1 row, data has 0

 

I would appreciate any help with this problem..

Thanks

Vivek

cager cage • 658 views
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Vivek.b100
Answer: CAGEr : problem while filtering the bamFile
2
gravatar for Vivek.b
3.3 years ago by
Vivek.b100
Germany
Vivek.b100 wrote:

I solved the problem today. Although it was a silly thing but was hard to catch. The problem was that I mapped my data using ENSEMBL annotation, and had no "chr" in my seqnames. Since the getCTSS methods uses this : 

reads.GRanges <- reads.GRanges[seqnames(reads.GRanges) %in% seqnames(genome)]

It removed all my entries and gives empty Granges. The method throws no warning here and gets stuck at the next step.

I want to request the authors to add a warning here, and also add a way to catch and solve for this chr vs no-chr problem while making the comparison, since others might also face the same issue.

Best,

Vivek

 

ADD COMMENTlink written 3.3 years ago by Vivek.b100

hi, I am using CAGEr too. And  I download the bam file from FANTOM5 website. I got the same problem like you. Can you tell me how to solve the problem?

> input.file <- c("G:/CAGEr1110/Mouse%20Neurons%20-%20striatal%2c%20donor3.CNhs12111.11646-122D8.mm10.nobarcode.bam")
> samples <- "test"
> myCAGEset <- new("CAGEset", genomeName = "BSgenome.Mmusculus.UCSC.mm10",
+                  inputFiles = input.file, inputFilesType = "bam",
+                  sampleLabels = samples)
> getCTSS(myCAGEset)

Reading in file: G:/CAGEr1110/Mouse%20Neurons%20-%20striatal%2c%20donor3.CNhs12111.11646-122D8.mm10.nobarcode.bam...
-> Filtering out low quality reads...
-> Removing the first base of the reads if 'G' and not aligned to the genome...
-> Estimating the frequency of adding a 'G' nucleotide and correcting the systematic bias...
Error in setnames(CTSS, c("chr", "pos", "strand", sample.label)) : 
  Can't assign 4 names to a 1 column data.table

the website I download the  bam file is http://fantom.gsc.riken.jp/5/datafiles/latest/basic/mouse.primary_cell.hCAGE/

by the way the info  of my R is 

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936    LC_MONETARY=Chinese (Simplified)_China.936
[4] LC_NUMERIC=C                               LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BSgenome.Mmusculus.UCSC.mm10_1.4.0 BSgenome_1.50.0                    rtracklayer_1.42.0                
 [4] Biostrings_2.50.1                  XVector_0.22.0                     GenomicRanges_1.34.0              
 [7] GenomeInfoDb_1.18.0                IRanges_2.16.0                     S4Vectors_0.20.0                  
[10] BiocGenerics_0.28.0                CAGEr_1.24.0                      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0                        stringdist_0.9.5.1                BSgenome.Mmusculus.UCSC.mm9_1.4.0
 [4] lattice_0.20-38                   formula.tools_1.7.1               Rsamtools_1.34.0                 
 [7] gtools_3.8.1                      assertthat_0.2.0                  digest_0.6.18                    
[10] R6_2.3.0                          plyr_1.8.4                        ggplot2_3.1.0                    
[13] pillar_1.3.0                      zlibbioc_1.28.0                   rlang_0.3.0.1                    
[16] lazyeval_0.2.1                    rstudioapi_0.8                    data.table_1.11.8                
[19] vegan_2.5-3                       Matrix_1.2-15                     splines_3.5.1                    
[22] BiocParallel_1.16.0               RCurl_1.95-4.11                   munsell_0.5.0                    
[25] DelayedArray_0.8.0                compiler_3.5.1                    pkgconfig_2.0.2                  
[28] mgcv_1.8-25                       tidyselect_0.2.5                  SummarizedExperiment_1.12.0      
[31] tibble_1.4.2                      GenomeInfoDbData_1.2.0            matrixStats_0.54.0               
[34] XML_3.98-1.16                     reshape_0.8.8                     permute_0.9-4                    
[37] crayon_1.3.4                      dplyr_0.7.7                       GenomicAlignments_1.18.0         
[40] MASS_7.3-51.1                     bitops_1.0-6                      grid_3.5.1                       
[43] nlme_3.1-137                      gtable_0.2.0                      magrittr_1.5                     
[46] scales_1.0.0                      KernSmooth_2.23-15                stringi_1.2.4                    
[49] som_0.3-5.1                       MultiAssayExperiment_1.8.1        bindrcpp_0.2.2                   
[52] tools_3.5.1                       Biobase_2.42.0                    glue_1.3.0                       
[55] purrr_0.2.5                       colorspace_1.3-2                  cluster_2.0.7-1                  
[58] operator.tools_1.6.3              beanplot_1.2                      memoise_1.1.0                    
[61] VGAM_1.0-6                        bindr_0.1.1

thanks  

zhong

ADD REPLYlink modified 6 months ago • written 6 months ago by Zhong JW0

Hi

It's been a while since I had this, but I think the problem was solved by switching the gene annotation to ensembl syle.

seqlevelsStyle(BSgenome.Mmusculus.UCSC.mm10) <- "ENSEMBL"

ADD REPLYlink written 7 weeks ago by Vivek.b100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 343 users visited in the last hour