Question: TEQC package very slow
0
gravatar for nac
7.3 years ago by
nac280
nac280 wrote:
HI, I am analysing coverage data using TEQC package from bioC for quality assessment of target enrichment experiment . I am using a computer cluster farm to do the analysis and asked for large memory to be allocated, my bam files are 11 Gb in size and it seems that the analysis is taking very long, several hours, and then my session exit. Do I need to ask for this to be put on a long queue, more than 12 hours job? Do people use TEQC with large files? How can I be more efficient with this analysis? these are my commands: #get reads myread<-get.reads("reads.bam",filetype="bam") #get pair reads : at that point this will fail :in the doc it is stated " To run the function can be quite time consuming, depending on the number of reads" myreadpair<-reads2pairs(myread) #drop single reads myread<-myread[!(myread$ID %in% myreadpair$singleReads$ID), , drop=TRUE] I have used efficiently these functions on smaller files with miSeq data, but not yet with HiSeq ... Many thanks for sharing your experience in getting QC for large files efficiently Nathalie > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=C [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] TEQC_2.4.0 hwriter_1.3 Rsamtools_1.8.4 [4] Biostrings_2.24.1 GenomicRanges_1.8.3 IRanges_1.14.2 [7] BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] Biobase_2.16.0 bitops_1.0-4.1 stats4_2.15.0 zlibbioc_1.2.0 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
coverage teqc • 527 views
ADD COMMENTlink modified 7.3 years ago • written 7.3 years ago by nac280
Answer: TEQC package very slow
0
gravatar for nac
7.3 years ago by
nac280
nac280 wrote:
HI, This is the error message produced at the myreadpair<-reads2pairs(myread) stage after it running for 7 hours: > readpairs4_2_PigS<-reads2pairs(reads4_2_PigS) [1] "there were 1453928 reads found without matching second read, or whose second read matches to a different chromosome" Error in endoapply(reads, mergefun) : 'FUN' did not produce an endomorphism > Terminated that may help, thanks, On 13/06/12 12:07, nathalie wrote: > HI, > I am analysing coverage data using TEQC package from bioC for quality > assessment of target enrichment experiment . > I am using a computer cluster farm to do the analysis and asked for > large memory to be allocated, my bam files are 11 Gb in size and it > seems that the analysis is taking very long, several hours, and then > my session exit. Do I need to ask for this to be put on a long queue, > more than 12 hours job? Do people use TEQC with large files? How can I > be more efficient with this analysis? > these are my commands: > #get reads > myread<-get.reads("reads.bam",filetype="bam") > #get pair reads : at that point this will fail :in the doc it is > stated " To run the function can be quite time consuming, depending on > the number of reads" > myreadpair<-reads2pairs(myread) > > #drop single reads > myread<-myread[!(myread$ID %in% myreadpair$singleReads$ID), , drop=TRUE] > > > I have used efficiently these functions on smaller files with miSeq > data, but not yet with HiSeq ... > Many thanks for sharing your experience in getting QC for large files > efficiently > Nathalie > > > sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=C > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] TEQC_2.4.0 hwriter_1.3 Rsamtools_1.8.4 > [4] Biostrings_2.24.1 GenomicRanges_1.8.3 IRanges_1.14.2 > [7] BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] Biobase_2.16.0 bitops_1.0-4.1 stats4_2.15.0 zlibbioc_1.2.0 > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
ADD COMMENTlink written 7.3 years ago by nac280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 112 users visited in the last hour