comparing seqnames
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Hi, I am trying to extracts mate pairs that are mapped to a particular chromosome (say chr1). The construction of my which() function is not returning what I need. May be I am missing something in constructing it. bf.data= readGappedAlignments(bam_files, param=ScanBamParam(what=scanBamWhat())) mate.pairs=table(mcols(bf.data)$qname) onlyPairs=names(mate.pairs)[mate.pairs==2] mappedPairs=bf.data[mcols(bf.data)$qname %in% onlyPairs] mate1=mappedPairs[c(T,F)] mate2=mappedPairs[c(F,T)] isSameCzome= (seqnames(mate1)==seqnames(mate2)) chrnames=seqnames(mate1[which(seqnames(mate1[isSameCzome])=="chr1")]) With the last statement I am intending to extract mate1 that are mapped to chr1. So I expect chrnames to contain only chr1, but it returns all other chromosomes and not just for chr1. Is there something I am missing in the way I have written the which statement? Thanks for your help. Cheers../Murli -- output of sessionInfo(): > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] VariantAnnotation_1.6.6 [2] org.Hs.eg.db_2.9.0 [3] RSQLite_0.11.4 [4] DBI_0.2-7 [5] BSgenome.Hsapiens.UCSC.hg19_1.3.19 [6] BSgenome_1.28.0 [7] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2 [8] GenomicFeatures_1.12.2 [9] AnnotationDbi_1.22.6 [10] Biobase_2.20.0 [11] Rsamtools_1.12.3 [12] Biostrings_2.28.0 [13] GenomicRanges_1.12.4 [14] IRanges_1.18.1 [15] BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] biomaRt_2.16.0 bitops_1.0-5 RCurl_1.95-4.1 rtracklayer_1.20.2 [5] stats4_3.0.1 tools_3.0.1 XML_3.96-1.1 zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org.
BSgenome BSgenome BSgenome BSgenome • 848 views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 2 days ago
United States
On 06/15/2013 10:17 AM, Maintainer wrote: > > Hi, > I am trying to extracts mate pairs that are mapped to a particular chromosome (say chr1). The construction of my which() function is not returning what I need. May be I am missing something in constructing it. > > bf.data= readGappedAlignments(bam_files, param=ScanBamParam(what=scanBamWhat())) when working with big data it's usually better to start your query as specifically as possible, so specify early that you're only interested in chr1. For some bam file bf <- BamFile("some.bam") then chr1gr = GRanges("chr1", IRanges(1L, seqlengths(bf)["chr1"])) specifies the range we're interested in, param = ScanBamParam(which=chr1gr) sets up a ScanBamParam (scanBamWhat() does not doing anything in your code, so I've removed it) and bf.data= readGappedAlignments(bam_files, param=param) reads in just chr1 reads, so no need to worry down-stream about non- chr1 reads. And a little more down below... > mate.pairs=table(mcols(bf.data)$qname) > onlyPairs=names(mate.pairs)[mate.pairs==2] > mappedPairs=bf.data[mcols(bf.data)$qname %in% onlyPairs] > mate1=mappedPairs[c(T,F)] > mate2=mappedPairs[c(F,T)] > isSameCzome= (seqnames(mate1)==seqnames(mate2)) > > chrnames=seqnames(mate1[which(seqnames(mate1[isSameCzome])=="chr1")]) here your 'which()' indexes into mate1[isSameCzome], whereas the outer seqnames() indexs into mate1. Thinking about the expensive (copying mate1) versus cheap (extracting / subsetting seqnames) operations, I might implement this as chrnames = seqnames(mate1)[seqnames(mate1) == "chr1"][isSameCzome] Hope that helps, Martin > > With the last statement I am intending to extract mate1 that are mapped to chr1. So I expect chrnames to contain only chr1, but it returns all other chromosomes and not just for chr1. Is there something I am missing in the way I have written the which statement? > > Thanks for your help. > > Cheers../Murli > > > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] VariantAnnotation_1.6.6 > [2] org.Hs.eg.db_2.9.0 > [3] RSQLite_0.11.4 > [4] DBI_0.2-7 > [5] BSgenome.Hsapiens.UCSC.hg19_1.3.19 > [6] BSgenome_1.28.0 > [7] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2 > [8] GenomicFeatures_1.12.2 > [9] AnnotationDbi_1.22.6 > [10] Biobase_2.20.0 > [11] Rsamtools_1.12.3 > [12] Biostrings_2.28.0 > [13] GenomicRanges_1.12.4 > [14] IRanges_1.18.1 > [15] BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.16.0 bitops_1.0-5 RCurl_1.95-4.1 rtracklayer_1.20.2 > [5] stats4_3.0.1 tools_3.0.1 XML_3.96-1.1 zlibbioc_1.6.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > ____________________________________________________________________ ____ > devteam-bioc mailing list > To unsubscribe from this mailing list send a blank email to > devteam-bioc-leave at lists.fhcrc.org > You can also unsubscribe or change your personal options at > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
-----Original Message----- From: Martin Morgan [mailto:mtmorgan@fhcrc.org] Sent: Saturday, June 15, 2013 1:31 PM To: guest at bioconductor.org Cc: Maintainer; bioconductor at r-project.org; Nair, Murlidharan T Subject: Re: [devteam-bioc] comparing seqnames On 06/15/2013 10:17 AM, Maintainer wrote: > > Hi, > I am trying to extracts mate pairs that are mapped to a particular chromosome (say chr1). The construction of my which() function is not returning what I need. May be I am missing something in constructing it. > > bf.data= readGappedAlignments(bam_files, > param=ScanBamParam(what=scanBamWhat())) when working with big data it's usually better to start your query as specifically as possible, so specify early that you're only interested in chr1. For some bam file bf <- BamFile("some.bam") then chr1gr = GRanges("chr1", IRanges(1L, seqlengths(bf)["chr1"])) specifies the range we're interested in, param = ScanBamParam(which=chr1gr) sets up a ScanBamParam (scanBamWhat() does not doing anything in your code, so I've removed it) and bf.data= readGappedAlignments(bam_files, param=param) reads in just chr1 reads, so no need to worry down-stream about non- chr1 reads. And a little more down below... > mate.pairs=table(mcols(bf.data)$qname) > onlyPairs=names(mate.pairs)[mate.pairs==2] > mappedPairs=bf.data[mcols(bf.data)$qname %in% onlyPairs] > mate1=mappedPairs[c(T,F)] mate2=mappedPairs[c(F,T)] isSameCzome= > (seqnames(mate1)==seqnames(mate2)) > > chrnames=seqnames(mate1[which(seqnames(mate1[isSameCzome])=="chr1")]) here your 'which()' indexes into mate1[isSameCzome], whereas the outer seqnames() indexs into mate1. Thinking about the expensive (copying mate1) versus cheap (extracting / subsetting seqnames) operations, I might implement this as chrnames = seqnames(mate1)[seqnames(mate1) == "chr1"][isSameCzome] Hope that helps, Martin >>Thanks Martin. the following worked. >>chrnames = seqnames(mate1)[seqnames(mate1) == "chr1"] >>Cheers../Murli > > With the last statement I am intending to extract mate1 that are mapped to chr1. So I expect chrnames to contain only chr1, but it returns all other chromosomes and not just for chr1. Is there something I am missing in the way I have written the which statement? > > Thanks for your help. > > Cheers../Murli > > > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] VariantAnnotation_1.6.6 > [2] org.Hs.eg.db_2.9.0 > [3] RSQLite_0.11.4 > [4] DBI_0.2-7 > [5] BSgenome.Hsapiens.UCSC.hg19_1.3.19 > [6] BSgenome_1.28.0 > [7] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2 > [8] GenomicFeatures_1.12.2 > [9] AnnotationDbi_1.22.6 > [10] Biobase_2.20.0 > [11] Rsamtools_1.12.3 > [12] Biostrings_2.28.0 > [13] GenomicRanges_1.12.4 > [14] IRanges_1.18.1 > [15] BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.16.0 bitops_1.0-5 RCurl_1.95-4.1 rtracklayer_1.20.2 > [5] stats4_3.0.1 tools_3.0.1 XML_3.96-1.1 zlibbioc_1.6.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > ______________________________________________________________________ > __ > devteam-bioc mailing list > To unsubscribe from this mailing list send a blank email to > devteam-bioc-leave at lists.fhcrc.org > You can also unsubscribe or change your personal options at > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
chr1.names = seqnames(mate1[isSameCzome])[seqnames(mate1[isSameCzome]) == "chr1"] The above worked. I guess that should give me the correct subsetting. Thx./Murli -----Original Message----- From: Martin Morgan [mailto:mtmorgan@fhcrc.org] Sent: Saturday, June 15, 2013 1:31 PM To: guest at bioconductor.org Cc: Maintainer; bioconductor at r-project.org; Nair, Murlidharan T Subject: Re: [devteam-bioc] comparing seqnames On 06/15/2013 10:17 AM, Maintainer wrote: > > Hi, > I am trying to extracts mate pairs that are mapped to a particular chromosome (say chr1). The construction of my which() function is not returning what I need. May be I am missing something in constructing it. > > bf.data= readGappedAlignments(bam_files, > param=ScanBamParam(what=scanBamWhat())) when working with big data it's usually better to start your query as specifically as possible, so specify early that you're only interested in chr1. For some bam file bf <- BamFile("some.bam") then chr1gr = GRanges("chr1", IRanges(1L, seqlengths(bf)["chr1"])) specifies the range we're interested in, param = ScanBamParam(which=chr1gr) sets up a ScanBamParam (scanBamWhat() does not doing anything in your code, so I've removed it) and bf.data= readGappedAlignments(bam_files, param=param) reads in just chr1 reads, so no need to worry down-stream about non- chr1 reads. And a little more down below... > mate.pairs=table(mcols(bf.data)$qname) > onlyPairs=names(mate.pairs)[mate.pairs==2] > mappedPairs=bf.data[mcols(bf.data)$qname %in% onlyPairs] > mate1=mappedPairs[c(T,F)] mate2=mappedPairs[c(F,T)] isSameCzome= > (seqnames(mate1)==seqnames(mate2)) > > chrnames=seqnames(mate1[which(seqnames(mate1[isSameCzome])=="chr1")]) here your 'which()' indexes into mate1[isSameCzome], whereas the outer seqnames() indexs into mate1. Thinking about the expensive (copying mate1) versus cheap (extracting / subsetting seqnames) operations, I might implement this as chrnames = seqnames(mate1)[seqnames(mate1) == "chr1"][isSameCzome] Hope that helps, Martin > > With the last statement I am intending to extract mate1 that are mapped to chr1. So I expect chrnames to contain only chr1, but it returns all other chromosomes and not just for chr1. Is there something I am missing in the way I have written the which statement? > > Thanks for your help. > > Cheers../Murli > > > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] VariantAnnotation_1.6.6 > [2] org.Hs.eg.db_2.9.0 > [3] RSQLite_0.11.4 > [4] DBI_0.2-7 > [5] BSgenome.Hsapiens.UCSC.hg19_1.3.19 > [6] BSgenome_1.28.0 > [7] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2 > [8] GenomicFeatures_1.12.2 > [9] AnnotationDbi_1.22.6 > [10] Biobase_2.20.0 > [11] Rsamtools_1.12.3 > [12] Biostrings_2.28.0 > [13] GenomicRanges_1.12.4 > [14] IRanges_1.18.1 > [15] BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.16.0 bitops_1.0-5 RCurl_1.95-4.1 rtracklayer_1.20.2 > [5] stats4_3.0.1 tools_3.0.1 XML_3.96-1.1 zlibbioc_1.6.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > ______________________________________________________________________ > __ > devteam-bioc mailing list > To unsubscribe from this mailing list send a blank email to > devteam-bioc-leave at lists.fhcrc.org > You can also unsubscribe or change your personal options at > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6