readGappedAlignments not working in Rsamtools
1
1
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Hello everybody, I'm a PhD student and I am doing my first analysis on RNAseq data. I was trying to use the script to generate a table of counts from bam files (shown here): library(Rsamtools) setwd("C:/R_data") bam_files <- list.files(pattern="*.bam") gr_list <- lapply(bam_files, function(bam_file) as(readGappedAlignments(bam_file), "GRanges")) names(gr_list) <- bam_files library(GenomicFeatures) txdb=makeTranscriptDbFromUCSC(genome='dm3',tablename='ensGene') tx_by_gene=transcriptsBy(txdb,'gene') ex_by_gene=exonsBy(txdb,'gene') toc=data.frame(rep(NA,length(tx_by_gene))) for(i in 1:length(bam_files)) { toc[,i]=countOverlaps(tx_by_gene,bam_files[[i]]) } rownames(toc)=names(tx_by_gene) colnames(toc)=names(gr_list) dim(toc) head(toc) But it doesn't seem to work, failing in two places: 1) Error in .class1(object) : could not find function "readGappedAlignments" 2) Download the ensGene table ... Error in function (type, msg, asError = TRUE) : couldn't connect to host For the point 1) I found that the function is deprecated but I still can't make things work changing it for the new function that is supposed to replace it "readGAlignmentsFromBam", Can anybody help me fixing this? The error in part 2) is completely out of my reach at the moment, I don't know how to fix this issue either, so any help would be appreciated. Thanks in advance! -- output of sessionInfo(): R version 3.1.0 (2014-04-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomicFeatures_1.16.2 AnnotationDbi_1.26.0 Biobase_2.24.0 [4] Rsamtools_1.16.0 Biostrings_2.32.0 XVector_0.4.0 [7] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.8 [10] BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.1 [4] biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6 [7] BSgenome_1.32.0 codetools_0.2-8 DBI_0.2-7 [10] digest_0.6.4 fail_1.2 foreach_1.4.2 [13] GenomicAlignments_1.0.1 iterators_1.0.7 plyr_1.8.1 [16] Rcpp_0.11.1 RCurl_1.95-4.1 RSQLite_0.11.4 [19] rtracklayer_1.24.2 sendmailR_1.1-2 stats4_3.1.0 [22] stringr_0.6.2 tools_3.1.0 XML_3.98-1.1 [25] zlibbioc_1.10.0 -- Sent via the guest posting facility at bioconductor.org.
RNASeq RNASeq • 4.3k views
ADD COMMENT
1
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States
Hi, readGappedAlignments() has been replaced with readGAlignments() and is now in the GenomicAlignments package, not Rsamtools. It looks like you want to count reads in a bam file against an annotation. summarizeOverlaps() can do this for you and offers options for counting reads that overlap multiple annotation features. (fyi, summarizeOverlaps() calls readGAlignments() or readGAlignmentPairs() under the hood.) library(GenomicAlignments) ?summarizeOverlaps ## see examples on counting Bam files Create a BamFileList of files: bfl <- BamFileList(bam_files) If the files are too large for available memory use a 'yieldSize': bfl <- BamFileList(bam_files, yieldSize=100000) We have a pre-built package for the dm3 ensembl genes. See this site for a list of available annotation packages: http://www.bioconductor.org/packages/devel/BiocViews.html#___Annotatio nData Install dm3 ensembl: library(TxDb.Dmelanogaster.UCSC.dm3.ensGene) exbygene <- exonsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene") Call summarizeOverlaps(): se_single <- summarizeOverlaps(exbygene, bfl) For paired-end reads use 'singleEnd=FALSE': se_paired <- summarizeOverlaps(exbygene, bfl, singleEnd=FALSE) The result is a SummarizedExperiment class with results in the 'assays' slot. assays(se_single) Valerie On 06/13/2014 01:15 AM, Maintainer wrote: > Hello everybody, > > I'm a PhD student and I am doing my first analysis on RNAseq data. > > I was trying to use the script to generate a table of counts from bam files (shown here): > > library(Rsamtools) > > setwd("C:/R_data") > > bam_files <- list.files(pattern="*.bam") > gr_list <- lapply(bam_files, > function(bam_file) > as(readGappedAlignments(bam_file), "GRanges")) > names(gr_list) <- bam_files > > library(GenomicFeatures) > txdb=makeTranscriptDbFromUCSC(genome='dm3',tablename='ensGene') > > tx_by_gene=transcriptsBy(txdb,'gene') > ex_by_gene=exonsBy(txdb,'gene') > toc=data.frame(rep(NA,length(tx_by_gene))) > for(i in 1:length(bam_files)) > { > toc[,i]=countOverlaps(tx_by_gene,bam_files[[i]]) > } > rownames(toc)=names(tx_by_gene) > colnames(toc)=names(gr_list) > dim(toc) > head(toc) > > But it doesn't seem to work, failing in two places: > > 1) Error in .class1(object) : could not find function "readGappedAlignments" > > 2) Download the ensGene table ... Error in function (type, msg, asError = TRUE) : couldn't connect to host > > For the point 1) I found that the function is deprecated but I still can't make things work changing it for the new function that is supposed to replace it "readGAlignmentsFromBam", Can anybody help me fixing this? > > The error in part 2) is completely out of my reach at the moment, I don't know how to fix this issue either, so any help would be appreciated. > > Thanks in advance! > > > > -- output of sessionInfo(): > > > R version 3.1.0 (2014-04-10) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 > [2] LC_CTYPE=English_United Kingdom.1252 > [3] LC_MONETARY=English_United Kingdom.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] GenomicFeatures_1.16.2 AnnotationDbi_1.26.0 Biobase_2.24.0 > [4] Rsamtools_1.16.0 Biostrings_2.32.0 XVector_0.4.0 > [7] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.8 > [10] BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.1 > [4] biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6 > [7] BSgenome_1.32.0 codetools_0.2-8 DBI_0.2-7 > [10] digest_0.6.4 fail_1.2 foreach_1.4.2 > [13] GenomicAlignments_1.0.1 iterators_1.0.7 plyr_1.8.1 > [16] Rcpp_0.11.1 RCurl_1.95-4.1 RSQLite_0.11.4 > [19] rtracklayer_1.24.2 sendmailR_1.1-2 stats4_3.1.0 > [22] stringr_0.6.2 tools_3.1.0 XML_3.98-1.1 > [25] zlibbioc_1.10.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > ____________________________________________________________________ ____ > devteam-bioc mailing list > To unsubscribe from this mailing list send a blank email to > devteam-bioc-leave at lists.fhcrc.org > You can also unsubscribe or change your personal options at > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > -- Valerie Obenchain Program in Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, Seattle, WA 98109 Email: vobencha at fhcrc.org Phone: (206) 667-3158
ADD COMMENT
0
Entering edit mode
Dear Valerie, Thank you very much for your answer. It worked perfectly and solved both issues. Best Alvaro > Date: Fri, 13 Jun 2014 08:42:42 -0700 > From: vobencha@fhcrc.org > To: guest@bioconductor.org; bioconductor@r-project.org; alfumao@hotmail.com > CC: maintainer@bioconductor.org > Subject: Re: [devteam-bioc] readGappedAlignments not working in Rsamtools > > Hi, > > readGappedAlignments() has been replaced with readGAlignments() and is > now in the GenomicAlignments package, not Rsamtools. > > It looks like you want to count reads in a bam file against an > annotation. summarizeOverlaps() can do this for you and offers options > for counting reads that overlap multiple annotation features. (fyi, > summarizeOverlaps() calls readGAlignments() or readGAlignmentPairs() > under the hood.) > > library(GenomicAlignments) > ?summarizeOverlaps ## see examples on counting Bam files > > > Create a BamFileList of files: > bfl <- BamFileList(bam_files) > > If the files are too large for available memory use a 'yieldSize': > bfl <- BamFileList(bam_files, yieldSize=100000) > > We have a pre-built package for the dm3 ensembl genes. See this site for > a list of available annotation packages: > http://www.bioconductor.org/packages/devel/BiocViews.html#___Annotat ionData > > Install dm3 ensembl: > library(TxDb.Dmelanogaster.UCSC.dm3.ensGene) > exbygene <- exonsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene") > > Call summarizeOverlaps(): > se_single <- summarizeOverlaps(exbygene, bfl) > > For paired-end reads use 'singleEnd=FALSE': > se_paired <- summarizeOverlaps(exbygene, bfl, singleEnd=FALSE) > > > The result is a SummarizedExperiment class with results in > the 'assays' slot. > assays(se_single) > > > Valerie > > > > > On 06/13/2014 01:15 AM, Maintainer wrote: > > Hello everybody, > > > > I'm a PhD student and I am doing my first analysis on RNAseq data. > > > > I was trying to use the script to generate a table of counts from bam files (shown here): > > > > library(Rsamtools) > > > > setwd("C:/R_data") > > > > bam_files <- list.files(pattern="*.bam") > > gr_list <- lapply(bam_files, > > function(bam_file) > > as(readGappedAlignments(bam_file), "GRanges")) > > names(gr_list) <- bam_files > > > > library(GenomicFeatures) > > txdb=makeTranscriptDbFromUCSC(genome='dm3',tablename='ensGene') > > > > tx_by_gene=transcriptsBy(txdb,'gene') > > ex_by_gene=exonsBy(txdb,'gene') > > toc=data.frame(rep(NA,length(tx_by_gene))) > > for(i in 1:length(bam_files)) > > { > > toc[,i]=countOverlaps(tx_by_gene,bam_files[[i]]) > > } > > rownames(toc)=names(tx_by_gene) > > colnames(toc)=names(gr_list) > > dim(toc) > > head(toc) > > > > But it doesn't seem to work, failing in two places: > > > > 1) Error in .class1(object) : could not find function "readGappedAlignments" > > > > 2) Download the ensGene table ... Error in function (type, msg, asError = TRUE) : couldn't connect to host > > > > For the point 1) I found that the function is deprecated but I still can't make things work changing it for the new function that is supposed to replace it "readGAlignmentsFromBam", Can anybody help me fixing this? > > > > The error in part 2) is completely out of my reach at the moment, I don't know how to fix this issue either, so any help would be appreciated. > > > > Thanks in advance! > > > > > > > > -- output of sessionInfo(): > > > > > > R version 3.1.0 (2014-04-10) > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > > > locale: > > [1] LC_COLLATE=English_United Kingdom.1252 > > [2] LC_CTYPE=English_United Kingdom.1252 > > [3] LC_MONETARY=English_United Kingdom.1252 > > [4] LC_NUMERIC=C > > [5] LC_TIME=English_United Kingdom.1252 > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] GenomicFeatures_1.16.2 AnnotationDbi_1.26.0 Biobase_2.24.0 > > [4] Rsamtools_1.16.0 Biostrings_2.32.0 XVector_0.4.0 > > [7] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.8 > > [10] BiocGenerics_0.10.0 > > > > loaded via a namespace (and not attached): > > [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.1 > > [4] biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6 > > [7] BSgenome_1.32.0 codetools_0.2-8 DBI_0.2-7 > > [10] digest_0.6.4 fail_1.2 foreach_1.4.2 > > [13] GenomicAlignments_1.0.1 iterators_1.0.7 plyr_1.8.1 > > [16] Rcpp_0.11.1 RCurl_1.95-4.1 RSQLite_0.11.4 > > [19] rtracklayer_1.24.2 sendmailR_1.1-2 stats4_3.1.0 > > [22] stringr_0.6.2 tools_3.1.0 XML_3.98-1.1 > > [25] zlibbioc_1.10.0 > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > __________________________________________________________________ ______ > > devteam-bioc mailing list > > To unsubscribe from this mailing list send a blank email to > > devteam-bioc-leave@lists.fhcrc.org > > You can also unsubscribe or change your personal options at > > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > > > > > -- > Valerie Obenchain > Program in Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, Seattle, WA 98109 > > Email: vobencha@fhcrc.org > Phone: (206) 667-3158 [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 599 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6