Search
Question: Stuck at "Read counting step", "summarizeOverlaps"
0
gravatar for JunLVI
10 months ago by
JunLVI40
Japan
JunLVI40 wrote:

Hi, I was following the workflow (http://www.bioconductor.org/help/workflows/rnaseqGene/)

somehow I got stuck at "Read counting step". 

sampleTable <- read.csv("my_sampletable.csv",row.names=1)
filenames <- paste0(sampleTable$samplename,".bam")
file.exists(filenames)
library("Rsamtools")
bamfiles <- BamFileList(filenames, yieldSize=2000000)
gtffile <- ("mm10genes.gtf") 
library("GenomicFeatures")
(txdb <- makeTxDbFromGFF(gtffile, format="gtf", circ_seqs=character()))
(ebg <- exonsBy(txdb, by="gene"))
library("GenomicAlignments")
library("BiocParallel")
register(MulticoreParam()) # for the first try, I used "register(SerialParam())"
se <- summarizeOverlaps(features=ebg, reads=bamfiles,
                        mode="Union",
                        singleEnd=FALSE,
                        ignore.strand=TRUE,
                        fragments=TRUE )

Then,  I was stuck at this step. 

Any suggestion about what might go wrong? 

if any information is needed for the trouble shooting, please let me know. 

TKS!

ADD COMMENTlink modified 10 months ago by Ryan C. Thompson6.1k • written 10 months ago by JunLVI40

What do you mean by "stuck"? Was there an error message? If you don't explain what problem you're having, there's not much we can do to help you.

ADD REPLYlink written 10 months ago by Ryan C. Thompson6.1k

Sorry Ryan, I am sorry that I did not explain clearly.
By "stuck", I meant
when I tried to create the SummarizedExperiment object with counts:
se <- summarizeOverlaps(features=ebg, reads=bamfiles,
                        mode="Union",
                        singleEnd=FALSE,
                        ignore.strand=TRUE,
                        fragments=TRUE )

Then the computer seemed to be running the call like forever. No error message. If i wait long enough (like overnight), I got broken pipe. I wish that explained. 

ADD REPLYlink written 10 months ago by JunLVI40

Can you (and Ryan) provide the output of the R command sessionInfo(). And also try a simple test of MulticoreParam(). I did the following

> library(BiocParallel)
> param = MulticoreParam()
> param
class: MulticoreParam
  bpisup: FALSE; bpnworkers: 6; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bptimeout: 2592000; bpprogressbar: FALSE
  bpRNGseed: 
  bplogdir: NA
  bpresultdir: NA
  cluster type: FORK
> bplapply(1:5, function(i) i, BPPARAM=param)
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4

[[5]]
[1] 5

> sessionInfo()
R Under development (unstable) (2017-01-14 r71969)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocParallel_1.9.4   BiocInstaller_1.25.3

loaded via a namespace (and not attached):
[1] compiler_3.4.0 tools_3.4.0    parallel_3.4.0
ADD REPLYlink written 10 months ago by Martin Morgan ♦♦ 20k

> library(BiocParallel)

> param = MulticoreParam()

> param

class: MulticoreParam 

  bpjobname:BPJOB; bpworkers:158; bptasks:0; bptimeout:Inf; bpRNGseed:; bpisup:FALSE

  bplog:FALSE; bpthreshold:INFO; bplogdir:NA

  bpstopOnError:FALSE; bpprogressbar:FALSE

  bpresultdir:NA

cluster type: FORK 

> bplapply(1:5, function(i) i, BPPARAM=param)

[[1]]

[1] 1

 

[[2]]

[1] 2

 

[[3]]

[1] 3

 

[[4]]

[1] 4

 

[[5]]

[1] 5

 

> sessionInfo()

R version 3.2.3 (2015-12-10)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Red Hat Enterprise Linux Server release 6.7 (Santiago)

 

locale:

 [1] LC_CTYPE=ja_JP.UTF-8       LC_NUMERIC=C              

 [3] LC_TIME=ja_JP.UTF-8        LC_COLLATE=ja_JP.UTF-8    

 [5] LC_MONETARY=ja_JP.UTF-8    LC_MESSAGES=ja_JP.UTF-8   

 [7] LC_PAPER=ja_JP.UTF-8       LC_NAME=C                 

 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

[11] LC_MEASUREMENT=ja_JP.UTF-8 LC_IDENTIFICATION=C       

 

attached base packages:

[1] parallel  stats     graphics  grDevices utils     datasets  methods  

[8] base     

 

other attached packages:

[1] BiocParallel_1.4.3

 

loaded via a namespace (and not attached):

[1] futile.logger_1.4.3  lambda.r_1.1.9       futile.options_1.0.0

>

ADD REPLYlink written 10 months ago by JunLVI40
0
gravatar for Ryan C. Thompson
10 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.1k wrote:

Based on your comment, I think you're seeing the same problem that I sometimes see when using MulticoreParam, where it causes R to hang forever. (Bioc maintainers: I only see this sometimes and haven't figured out how to reproduce it yet.) My suggestion would be to try another parallel param, such as DoparParam():

library(parallel)
library(doParallel)
registerDoParallel(cores=detectCores())
library(BiocParallel)
register(DoparParam())

Then you should get parallel read counting.

ADD COMMENTlink written 10 months ago by Ryan C. Thompson6.1k

Hi Ryan, Thanks for your reply. I retried MulticoreParam, somehow it worked. (why? I do not know). I will try DoparParam when MulticoreParam do not work. TKS!

 

ADD REPLYlink written 10 months ago by JunLVI40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 213 users visited in the last hour