GenomicRanges seqlengths problem
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Hello, I have a problem with supplying GRanges object with seqlengths. I have a files. chrs.txt - contains information about chromosomes chrs.txt chr,start,end,len lm_SuperContig_0_v2,1,4258568,4258568 lm_SuperContig_1_v2,1,3378610,3378610 lm_SuperContig_2_v2,1,2939989,2939989 lm_SuperContig_3_v2,1,2348246,2348246 lm_SuperContig_4_v2,1,1918205,1918205 lm_SuperContig_6_v2,1,1888674,1888674 lm_SuperContig_5_v2,1,1869450,1869450 lm_SuperContig_8_v2,1,1809296,1809296 lm_SuperContig_9_v2,1,1772623,1772623 lm_SuperContig_7_v2,1,1769547,1769547 lm_SuperContig_10_v2,1,1758670,1758670 lm_SuperContig_13_v2,1,1634580,1634580 lm_SuperContig_12_v2,1,1631710,1631710 lm_SuperContig_11_v2,1,1590160,1590160 lm_SuperContig_15_v2,1,1560629,1560629 lm_SuperContig_14_v2,1,1533332,1533332 lm_SuperContig_17_v2,1,1445693,1445693 lm_SuperContig_16_v2,1,1397653,1397653 lm_SuperContig_18_v2,1,1351976,1351976 lm_SuperContig_19_v2,1,1186800,1186800 lm_SuperContig_20_v2,1,1087932,1087932 lm_SuperContig_21_v2,1,1020521,1020521 lm_SuperContig_22_v2,1,731443,731443 lm_SuperContig_23_v2,1,521426,521426 lm_SuperContig_24_v2,1,475869,475869 lm_SuperContig_25_v2,1,318058,318058 lm_SuperContig_26_v2,1,261540,261540 lm_SuperContig_27_v2,1,250629,250629 lm_SuperContig_28_v2,1,236098,236098 lm_SuperContig_29_v2,1,200940,200940 lm_SuperContig_30_v2,1,154863,154863 lm_SuperContig_31_v2,1,143268,143268 lm_SuperContig_32_v2,1,87679,87679 lm_SuperContig_34_v2,1,58596,58596 I try to do the following: # creating GRanges object for chromosomes dataChr <- read.table("chrs.txt",header=T,sep=",") chrs <- with(dataChr, GRanges(chr, IRanges(start, end))) sl <- setNames(dataChr$len, as.character(dataChr$chr)) seqlengths(chrs) <- sl And I get the following error: Error in .normargSeqlengths(value, seqnames(x)) : when the supplied 'seqlengths' vector is named, the names must match the seqnames Any chance you could help me with what is going on? Best wishes, Agnieszka -- output of sessionInfo(): R version 3.1.0 (2014-04-10) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] XVector_0.4.0 ggbio_1.12.3 ggplot2_0.9.3.1 GenomicRanges_1.16.2 GenomeInfoDb_1.0.2 IRanges_1.22.4 BiocGenerics_0.10.0 [8] BiocInstaller_1.14.2 loaded via a namespace (and not attached): [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6 Biobase_2.24.0 BiocParallel_0.6.0 biomaRt_2.20.0 [7] Biostrings_2.32.0 biovizBase_1.12.1 bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 cluster_1.15.2 [13] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 digest_0.6.4 fail_1.2 [19] foreach_1.4.2 Formula_1.1-1 GenomicAlignments_1.0.0 GenomicFeatures_1.16.0 grid_3.1.0 gridExtra_0.9.1 [25] gtable_0.1.2 Hmisc_3.14-4 iterators_1.0.7 labeling_0.2 lattice_0.20-29 latticeExtra_0.6-26 [31] MASS_7.3-31 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.11.1 [37] RCurl_1.95-4.1 reshape2_1.4 Rsamtools_1.16.0 RSQLite_0.11.4 rtracklayer_1.24.0 scales_0.2.4 [43] sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 stringr_0.6.2 survival_2.37-7 tools_3.1.0 [49] VariantAnnotation_1.10.0 XML_3.98-1.1 zlibbioc_1.10.0 -- Sent via the guest posting facility at bioconductor.org.
• 2.6k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States
Hi, The seqnames are in a different orders in the 'chrs' and 'sl' objects. Looking at the first 3 names in each, > head(seqlengths(chrs), 3) lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_10_v2 NA NA NA > head(sl, 3) lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_2_v2 4258568 3378610 2939989 When a GRanges is created, seqnames are sorted in ascii order. You can see the full list with seqinfo(): > seqinfo(chrs) Seqinfo of length 34 seqnames seqlengths isCircular genome lm_SuperContig_0_v2 <na> <na> <na> lm_SuperContig_1_v2 <na> <na> <na> lm_SuperContig_10_v2 <na> <na> <na> lm_SuperContig_11_v2 <na> <na> <na> lm_SuperContig_12_v2 <na> <na> <na> ... ... ... ... The seqnames in the replacement vector must match the order in the 'Seqinfo' object. To re-order: sl_new <- sl[match(levels(factor(names(sl))), names(sl))] Then add the new lengths: >> seqlengths(chrs) <- sl_new >> chrs > GRanges with 34 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] lm_SuperContig_0_v2 [1, 4258568] * > [2] lm_SuperContig_1_v2 [1, 3378610] * > [3] lm_SuperContig_2_v2 [1, 2939989] * > [4] lm_SuperContig_3_v2 [1, 2348246] * > [5] lm_SuperContig_4_v2 [1, 1918205] * > ... ... ... ... > [30] lm_SuperContig_29_v2 [1, 200940] * > [31] lm_SuperContig_30_v2 [1, 154863] * > [32] lm_SuperContig_31_v2 [1, 143268] * > [33] lm_SuperContig_32_v2 [1, 87679] * > [34] lm_SuperContig_34_v2 [1, 58596] * > --- > seqlengths: > lm_SuperContig_0_v2 lm_SuperContig_1_v2 ... lm_SuperContig_9_v2 > 4258568 3378610 ... 1772623 Valerie On 05/01/2014 07:20 AM, Maintainer wrote: > > Hello, > > I have a problem with supplying GRanges object with seqlengths. > I have a files. > chrs.txt - contains information about chromosomes > > chrs.txt > chr,start,end,len > lm_SuperContig_0_v2,1,4258568,4258568 > lm_SuperContig_1_v2,1,3378610,3378610 > lm_SuperContig_2_v2,1,2939989,2939989 > lm_SuperContig_3_v2,1,2348246,2348246 > lm_SuperContig_4_v2,1,1918205,1918205 > lm_SuperContig_6_v2,1,1888674,1888674 > lm_SuperContig_5_v2,1,1869450,1869450 > lm_SuperContig_8_v2,1,1809296,1809296 > lm_SuperContig_9_v2,1,1772623,1772623 > lm_SuperContig_7_v2,1,1769547,1769547 > lm_SuperContig_10_v2,1,1758670,1758670 > lm_SuperContig_13_v2,1,1634580,1634580 > lm_SuperContig_12_v2,1,1631710,1631710 > lm_SuperContig_11_v2,1,1590160,1590160 > lm_SuperContig_15_v2,1,1560629,1560629 > lm_SuperContig_14_v2,1,1533332,1533332 > lm_SuperContig_17_v2,1,1445693,1445693 > lm_SuperContig_16_v2,1,1397653,1397653 > lm_SuperContig_18_v2,1,1351976,1351976 > lm_SuperContig_19_v2,1,1186800,1186800 > lm_SuperContig_20_v2,1,1087932,1087932 > lm_SuperContig_21_v2,1,1020521,1020521 > lm_SuperContig_22_v2,1,731443,731443 > lm_SuperContig_23_v2,1,521426,521426 > lm_SuperContig_24_v2,1,475869,475869 > lm_SuperContig_25_v2,1,318058,318058 > lm_SuperContig_26_v2,1,261540,261540 > lm_SuperContig_27_v2,1,250629,250629 > lm_SuperContig_28_v2,1,236098,236098 > lm_SuperContig_29_v2,1,200940,200940 > lm_SuperContig_30_v2,1,154863,154863 > lm_SuperContig_31_v2,1,143268,143268 > lm_SuperContig_32_v2,1,87679,87679 > lm_SuperContig_34_v2,1,58596,58596 > > I try to do the following: > # creating GRanges object for chromosomes > dataChr <- read.table("chrs.txt",header=T,sep=",") > chrs <- with(dataChr, GRanges(chr, IRanges(start, end))) > sl <- setNames(dataChr$len, as.character(dataChr$chr)) > seqlengths(chrs) <- sl > > And I get the following error: > > Error in .normargSeqlengths(value, seqnames(x)) : > when the supplied 'seqlengths' vector is named, the names must match the seqnames > > Any chance you could help me with what is going on? > > Best wishes, > Agnieszka > > > -- output of sessionInfo(): > > R version 3.1.0 (2014-04-10) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 > [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] XVector_0.4.0 ggbio_1.12.3 ggplot2_0.9.3.1 GenomicRanges_1.16.2 GenomeInfoDb_1.0.2 IRanges_1.22.4 BiocGenerics_0.10.0 > [8] BiocInstaller_1.14.2 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6 Biobase_2.24.0 BiocParallel_0.6.0 biomaRt_2.20.0 > [7] Biostrings_2.32.0 biovizBase_1.12.1 bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 cluster_1.15.2 > [13] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 digest_0.6.4 fail_1.2 > [19] foreach_1.4.2 Formula_1.1-1 GenomicAlignments_1.0.0 GenomicFeatures_1.16.0 grid_3.1.0 gridExtra_0.9.1 > [25] gtable_0.1.2 Hmisc_3.14-4 iterators_1.0.7 labeling_0.2 lattice_0.20-29 latticeExtra_0.6-26 > [31] MASS_7.3-31 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.11.1 > [37] RCurl_1.95-4.1 reshape2_1.4 Rsamtools_1.16.0 RSQLite_0.11.4 rtracklayer_1.24.0 scales_0.2.4 > [43] sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 stringr_0.6.2 survival_2.37-7 tools_3.1.0 > [49] VariantAnnotation_1.10.0 XML_3.98-1.1 zlibbioc_1.10.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > ____________________________________________________________________ ____ > devteam-bioc mailing list > To unsubscribe from this mailing list send a blank email to > devteam-bioc-leave at lists.fhcrc.org > You can also unsubscribe or change your personal options at > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > -- Valerie Obenchain Program in Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, Seattle, WA 98109 Email: vobencha at fhcrc.org Phone: (206) 667-3158
ADD COMMENT
0
Entering edit mode
Hi Sonia, Val, On 05/01/2014 09:52 AM, Maintainer wrote: > Hi, > > The seqnames are in a different orders in the 'chrs' and 'sl' objects. > Looking at the first 3 names in each, > > > head(seqlengths(chrs), 3) > lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_10_v2 > NA NA NA > > > head(sl, 3) > lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_2_v2 > 4258568 3378610 2939989 > > When a GRanges is created, seqnames are sorted in ascii order. Just to clarify, the seqlevels are sorted, not the seqnames: > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) > gr GRanges with 3 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] b [1, 10] * [2] a [2, 10] * [3] b [3, 10] * --- seqlengths: a b NA NA Not sorted (i.e. user-supplied order is preserved): > seqnames(gr) factor-Rle of length 3 with 3 runs Lengths: 1 1 1 Values : b a b Levels(2): a b Sorted: > seqlevels(gr) [1] "a" "b" This sorting of the seqlevels is not good anyway (people with different LOCALE will get different results). I just fixed this so now the GRanges() constructor will preserve the seqlevels in the order supplied by the user: > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) > seqlevels(gr) [1] "b" "a" More precisely, the seqlevels are obtained by doing unique() on the seqnames. The fix will propagate to the public repos and become available thru biocLite() in the next 24 hours. Then your original code should just work Sonia. Cheers, H. > You can > see the full list with seqinfo(): > > > seqinfo(chrs) > Seqinfo of length 34 > seqnames seqlengths isCircular genome > lm_SuperContig_0_v2 <na> <na> <na> > lm_SuperContig_1_v2 <na> <na> <na> > lm_SuperContig_10_v2 <na> <na> <na> > lm_SuperContig_11_v2 <na> <na> <na> > lm_SuperContig_12_v2 <na> <na> <na> > ... ... ... ... > > The seqnames in the replacement vector must match the order in the > 'Seqinfo' object. > > To re-order: > > sl_new <- sl[match(levels(factor(names(sl))), names(sl))] > > Then add the new lengths: > >>> seqlengths(chrs) <- sl_new >>> chrs >> GRanges with 34 ranges and 0 metadata columns: >> seqnames ranges strand >> <rle> <iranges> <rle> >> [1] lm_SuperContig_0_v2 [1, 4258568] * >> [2] lm_SuperContig_1_v2 [1, 3378610] * >> [3] lm_SuperContig_2_v2 [1, 2939989] * >> [4] lm_SuperContig_3_v2 [1, 2348246] * >> [5] lm_SuperContig_4_v2 [1, 1918205] * >> ... ... ... ... >> [30] lm_SuperContig_29_v2 [1, 200940] * >> [31] lm_SuperContig_30_v2 [1, 154863] * >> [32] lm_SuperContig_31_v2 [1, 143268] * >> [33] lm_SuperContig_32_v2 [1, 87679] * >> [34] lm_SuperContig_34_v2 [1, 58596] * >> --- >> seqlengths: >> lm_SuperContig_0_v2 lm_SuperContig_1_v2 ... lm_SuperContig_9_v2 >> 4258568 3378610 ... 1772623 > > > Valerie > > > On 05/01/2014 07:20 AM, Maintainer wrote: >> >> Hello, >> >> I have a problem with supplying GRanges object with seqlengths. >> I have a files. >> chrs.txt - contains information about chromosomes >> >> chrs.txt >> chr,start,end,len >> lm_SuperContig_0_v2,1,4258568,4258568 >> lm_SuperContig_1_v2,1,3378610,3378610 >> lm_SuperContig_2_v2,1,2939989,2939989 >> lm_SuperContig_3_v2,1,2348246,2348246 >> lm_SuperContig_4_v2,1,1918205,1918205 >> lm_SuperContig_6_v2,1,1888674,1888674 >> lm_SuperContig_5_v2,1,1869450,1869450 >> lm_SuperContig_8_v2,1,1809296,1809296 >> lm_SuperContig_9_v2,1,1772623,1772623 >> lm_SuperContig_7_v2,1,1769547,1769547 >> lm_SuperContig_10_v2,1,1758670,1758670 >> lm_SuperContig_13_v2,1,1634580,1634580 >> lm_SuperContig_12_v2,1,1631710,1631710 >> lm_SuperContig_11_v2,1,1590160,1590160 >> lm_SuperContig_15_v2,1,1560629,1560629 >> lm_SuperContig_14_v2,1,1533332,1533332 >> lm_SuperContig_17_v2,1,1445693,1445693 >> lm_SuperContig_16_v2,1,1397653,1397653 >> lm_SuperContig_18_v2,1,1351976,1351976 >> lm_SuperContig_19_v2,1,1186800,1186800 >> lm_SuperContig_20_v2,1,1087932,1087932 >> lm_SuperContig_21_v2,1,1020521,1020521 >> lm_SuperContig_22_v2,1,731443,731443 >> lm_SuperContig_23_v2,1,521426,521426 >> lm_SuperContig_24_v2,1,475869,475869 >> lm_SuperContig_25_v2,1,318058,318058 >> lm_SuperContig_26_v2,1,261540,261540 >> lm_SuperContig_27_v2,1,250629,250629 >> lm_SuperContig_28_v2,1,236098,236098 >> lm_SuperContig_29_v2,1,200940,200940 >> lm_SuperContig_30_v2,1,154863,154863 >> lm_SuperContig_31_v2,1,143268,143268 >> lm_SuperContig_32_v2,1,87679,87679 >> lm_SuperContig_34_v2,1,58596,58596 >> >> I try to do the following: >> # creating GRanges object for chromosomes >> dataChr <- read.table("chrs.txt",header=T,sep=",") >> chrs <- with(dataChr, GRanges(chr, IRanges(start, end))) >> sl <- setNames(dataChr$len, as.character(dataChr$chr)) >> seqlengths(chrs) <- sl >> >> And I get the following error: >> >> Error in .normargSeqlengths(value, seqnames(x)) : >> when the supplied 'seqlengths' vector is named, the names must match the seqnames >> >> Any chance you could help me with what is going on? >> >> Best wishes, >> Agnieszka >> >> >> -- output of sessionInfo(): >> >> R version 3.1.0 (2014-04-10) >> Platform: i386-w64-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 >> [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] XVector_0.4.0 ggbio_1.12.3 ggplot2_0.9.3.1 GenomicRanges_1.16.2 GenomeInfoDb_1.0.2 IRanges_1.22.4 BiocGenerics_0.10.0 >> [8] BiocInstaller_1.14.2 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6 Biobase_2.24.0 BiocParallel_0.6.0 biomaRt_2.20.0 >> [7] Biostrings_2.32.0 biovizBase_1.12.1 bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 cluster_1.15.2 >> [13] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 digest_0.6.4 fail_1.2 >> [19] foreach_1.4.2 Formula_1.1-1 GenomicAlignments_1.0.0 GenomicFeatures_1.16.0 grid_3.1.0 gridExtra_0.9.1 >> [25] gtable_0.1.2 Hmisc_3.14-4 iterators_1.0.7 labeling_0.2 lattice_0.20-29 latticeExtra_0.6-26 >> [31] MASS_7.3-31 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.11.1 >> [37] RCurl_1.95-4.1 reshape2_1.4 Rsamtools_1.16.0 RSQLite_0.11.4 rtracklayer_1.24.0 scales_0.2.4 >> [43] sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 stringr_0.6.2 survival_2.37-7 tools_3.1.0 >> [49] VariantAnnotation_1.10.0 XML_3.98-1.1 zlibbioc_1.10.0 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> ___________________________________________________________________ _____ >> devteam-bioc mailing list >> To unsubscribe from this mailing list send a blank email to >> devteam-bioc-leave at lists.fhcrc.org >> You can also unsubscribe or change your personal options at >> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc >> > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
On 05/01/2014 10:21 AM, Maintainer wrote: > Hi Sonia, Val, > > On 05/01/2014 09:52 AM, Maintainer wrote: >> Hi, >> >> The seqnames are in a different orders in the 'chrs' and 'sl' objects. >> Looking at the first 3 names in each, >> >> > head(seqlengths(chrs), 3) >> lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_10_v2 >> NA NA NA >> >> > head(sl, 3) >> lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_2_v2 >> 4258568 3378610 2939989 >> >> When a GRanges is created, seqnames are sorted in ascii order. > > Just to clarify, the seqlevels are sorted, not the seqnames: > > > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) > > gr > GRanges with 3 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] b [1, 10] * > [2] a [2, 10] * > [3] b [3, 10] * > --- > seqlengths: > a b > NA NA > > Not sorted (i.e. user-supplied order is preserved): > > > seqnames(gr) > factor-Rle of length 3 with 3 runs > Lengths: 1 1 1 > Values : b a b > Levels(2): a b > > Sorted: > > > seqlevels(gr) > [1] "a" "b" > > This sorting of the seqlevels is not good anyway (people with different > LOCALE will get different results). I just fixed this so now the > GRanges() constructor will preserve the seqlevels in the order supplied > by the user: > > > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) > > seqlevels(gr) > [1] "b" "a" > > More precisely, the seqlevels are obtained by doing unique() on the > seqnames. > > The fix will propagate to the public repos and become available thru > biocLite() in the next 24 hours. > > Then your original code should just work Sonia. Or maybe not :) You might still need to use 'stringsAsFactors=FALSE' when calling read.table(). Please let us know if that still doesn't make it. Thanks, H. > > Cheers, > H. > > >> You can >> see the full list with seqinfo(): >> >> > seqinfo(chrs) >> Seqinfo of length 34 >> seqnames seqlengths isCircular genome >> lm_SuperContig_0_v2 <na> <na> <na> >> lm_SuperContig_1_v2 <na> <na> <na> >> lm_SuperContig_10_v2 <na> <na> <na> >> lm_SuperContig_11_v2 <na> <na> <na> >> lm_SuperContig_12_v2 <na> <na> <na> >> ... ... ... ... >> >> The seqnames in the replacement vector must match the order in the >> 'Seqinfo' object. >> >> To re-order: >> >> sl_new <- sl[match(levels(factor(names(sl))), names(sl))] >> >> Then add the new lengths: >> >>>> seqlengths(chrs) <- sl_new >>>> chrs >>> GRanges with 34 ranges and 0 metadata columns: >>> seqnames ranges strand >>> <rle> <iranges> <rle> >>> [1] lm_SuperContig_0_v2 [1, 4258568] * >>> [2] lm_SuperContig_1_v2 [1, 3378610] * >>> [3] lm_SuperContig_2_v2 [1, 2939989] * >>> [4] lm_SuperContig_3_v2 [1, 2348246] * >>> [5] lm_SuperContig_4_v2 [1, 1918205] * >>> ... ... ... ... >>> [30] lm_SuperContig_29_v2 [1, 200940] * >>> [31] lm_SuperContig_30_v2 [1, 154863] * >>> [32] lm_SuperContig_31_v2 [1, 143268] * >>> [33] lm_SuperContig_32_v2 [1, 87679] * >>> [34] lm_SuperContig_34_v2 [1, 58596] * >>> --- >>> seqlengths: >>> lm_SuperContig_0_v2 lm_SuperContig_1_v2 ... lm_SuperContig_9_v2 >>> 4258568 3378610 ... 1772623 >> >> >> Valerie >> >> >> On 05/01/2014 07:20 AM, Maintainer wrote: >>> >>> Hello, >>> >>> I have a problem with supplying GRanges object with seqlengths. >>> I have a files. >>> chrs.txt - contains information about chromosomes >>> >>> chrs.txt >>> chr,start,end,len >>> lm_SuperContig_0_v2,1,4258568,4258568 >>> lm_SuperContig_1_v2,1,3378610,3378610 >>> lm_SuperContig_2_v2,1,2939989,2939989 >>> lm_SuperContig_3_v2,1,2348246,2348246 >>> lm_SuperContig_4_v2,1,1918205,1918205 >>> lm_SuperContig_6_v2,1,1888674,1888674 >>> lm_SuperContig_5_v2,1,1869450,1869450 >>> lm_SuperContig_8_v2,1,1809296,1809296 >>> lm_SuperContig_9_v2,1,1772623,1772623 >>> lm_SuperContig_7_v2,1,1769547,1769547 >>> lm_SuperContig_10_v2,1,1758670,1758670 >>> lm_SuperContig_13_v2,1,1634580,1634580 >>> lm_SuperContig_12_v2,1,1631710,1631710 >>> lm_SuperContig_11_v2,1,1590160,1590160 >>> lm_SuperContig_15_v2,1,1560629,1560629 >>> lm_SuperContig_14_v2,1,1533332,1533332 >>> lm_SuperContig_17_v2,1,1445693,1445693 >>> lm_SuperContig_16_v2,1,1397653,1397653 >>> lm_SuperContig_18_v2,1,1351976,1351976 >>> lm_SuperContig_19_v2,1,1186800,1186800 >>> lm_SuperContig_20_v2,1,1087932,1087932 >>> lm_SuperContig_21_v2,1,1020521,1020521 >>> lm_SuperContig_22_v2,1,731443,731443 >>> lm_SuperContig_23_v2,1,521426,521426 >>> lm_SuperContig_24_v2,1,475869,475869 >>> lm_SuperContig_25_v2,1,318058,318058 >>> lm_SuperContig_26_v2,1,261540,261540 >>> lm_SuperContig_27_v2,1,250629,250629 >>> lm_SuperContig_28_v2,1,236098,236098 >>> lm_SuperContig_29_v2,1,200940,200940 >>> lm_SuperContig_30_v2,1,154863,154863 >>> lm_SuperContig_31_v2,1,143268,143268 >>> lm_SuperContig_32_v2,1,87679,87679 >>> lm_SuperContig_34_v2,1,58596,58596 >>> >>> I try to do the following: >>> # creating GRanges object for chromosomes >>> dataChr <- read.table("chrs.txt",header=T,sep=",") >>> chrs <- with(dataChr, GRanges(chr, IRanges(start, end))) >>> sl <- setNames(dataChr$len, as.character(dataChr$chr)) >>> seqlengths(chrs) <- sl >>> >>> And I get the following error: >>> >>> Error in .normargSeqlengths(value, seqnames(x)) : >>> when the supplied 'seqlengths' vector is named, the names must match the seqnames >>> >>> Any chance you could help me with what is going on? >>> >>> Best wishes, >>> Agnieszka >>> >>> >>> -- output of sessionInfo(): >>> >>> R version 3.1.0 (2014-04-10) >>> Platform: i386-w64-mingw32/i386 (32-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 >>> [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] XVector_0.4.0 ggbio_1.12.3 ggplot2_0.9.3.1 GenomicRanges_1.16.2 GenomeInfoDb_1.0.2 IRanges_1.22.4 BiocGenerics_0.10.0 >>> [8] BiocInstaller_1.14.2 >>> >>> loaded via a namespace (and not attached): >>> [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6 Biobase_2.24.0 BiocParallel_0.6.0 biomaRt_2.20.0 >>> [7] Biostrings_2.32.0 biovizBase_1.12.1 bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 cluster_1.15.2 >>> [13] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 digest_0.6.4 fail_1.2 >>> [19] foreach_1.4.2 Formula_1.1-1 GenomicAlignments_1.0.0 GenomicFeatures_1.16.0 grid_3.1.0 gridExtra_0.9.1 >>> [25] gtable_0.1.2 Hmisc_3.14-4 iterators_1.0.7 labeling_0.2 lattice_0.20-29 latticeExtra_0.6-26 >>> [31] MASS_7.3-31 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.11.1 >>> [37] RCurl_1.95-4.1 reshape2_1.4 Rsamtools_1.16.0 RSQLite_0.11.4 rtracklayer_1.24.0 scales_0.2.4 >>> [43] sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 stringr_0.6.2 survival_2.37-7 tools_3.1.0 >>> [49] VariantAnnotation_1.10.0 XML_3.98-1.1 zlibbioc_1.10.0 >>> >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> __________________________________________________________________ ______ >>> devteam-bioc mailing list >>> To unsubscribe from this mailing list send a blank email to >>> devteam-bioc-leave at lists.fhcrc.org >>> You can also unsubscribe or change your personal options at >>> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc >>> >> >> > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Hello, Thank you very much for your help. I am facing one more problem with GRanges. I am trying to emulate this object. data("CRC", package = "biovizBase") object: mut.gr (it's SNP annotation for human chromosomes 1-22) When I just do show on mut.gr showmut.gr) GRanges with 60 ranges and 10 metadata columns: seqnames ranges strand | Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Strand Variant_Classification Variant_Type Reference_Allele <rle> <iranges> <rle> | <factor> <integer> <factor> <integer> <factor> <factor> <factor> <factor> [1] 1 [ 11003085, 11003085] + | TARDBP 23435 Broad 36 + Missense SNP G [2] 1 [ 62352395, 62352395] + | INADL 10207 Broad 36 + Missense SNP T [3] 1 [194960885, 194960885] + | CFH 3075 Broad 36 + Missense SNP G [4] 2 [ 10116508, 10116508] - | CYS1 192668 Broad 36 - Missense SNP C [5] 2 [ 33617747, 33617747] + | RASGRP3 25780 Broad 36 + Missense SNP C [6] 2 [ 73894280, 73894280] + | C2orf78 388960 Broad 36 + Missense SNP T [7] 2 [ 96732769, 96732769] + | FER1L5 90342 Broad 36 + Missense SNP T [8] 2 [179160267, 179160267] - | TTN 7273 Broad 36 - Missense SNP C [9] 2 [217251189, 217251189] - | IGFBP5 3488 Broad 36 - Missense SNP C [10] 3 [ 12620699, 12620699] - | RAF1 5894 Broad 36 - Missense SNP G [11] 3 [ 46472880, 46472880] - | LTF 4057 Broad 36 - Missense SNP C [12] 3 [137203130, 137203130] + | PPP2R3A 5523 Broad 36 + Missense SNP A [13] 3 [137457429, 137457429] + | PCCB 5096 Broad 36 + Missense SNP C [14] 3 [184708629, 184708629] - | KLHL6 89857 Broad 36 - Missense SNP G [15] 4 [147434591, 147434591] - | SLC10A7 84068 Broad 36 - Missense SNP G [16] 4 [185915412, 185915412] - | ACSL1 2180 Broad 36 - Missense SNP A [17] 5 [ 79070342, 79070342] + | CMYA5 202333 Broad 36 + Missense SNP C [18] 5 [ 94775579, 94775579] + | FAM81B 153643 Broad 36 + Missense SNP G [19] 5 [140838266, 140838266] + | PCDHGC3 5098 Broad 36 + Missense SNP T [20] 6 [109992724, 109992724] - | AKD1 221264 Broad 36 - Missense SNP C [21] 6 [118993492, 118993492] - | C6orf204 387119 Broad 36 - Missense SNP G [22] 7 [124286401, 124286401] - | POT1 25913 Broad 36 - Missense SNP A [23] 7 [125960456, 125960456] - | GRM8 2918 Broad 36 - Missense SNP A [24] 9 [ 35097658, 35097658] - | KIAA1539 80256 Broad 36 - Missense SNP A [25] 9 [103164773, 103164773] - | BAAT 570 Broad 36 - Missense SNP G [26] 9 [132745783, 132745783] + | ABL1 25 Broad 36 + Missense SNP C [27] 9 [138481305, 138481305] - | SEC16A 9919 Broad 36 - Missense SNP C [28] 10 [ 7661761, 7661761] - | ITIH5 80760 Broad 36 - Missense SNP C [29] 10 [106064683, 106064683] - | ITPRIP 85450 Broad 36 - Missense SNP T [30] 11 [ 603430, 603430] - | IRF7 3665 Broad 36 - Missense SNP G [31] 11 [ 5610148, 5610148] + | TRIM34 53840 Broad 36 + Missense SNP A [32] 11 [ 9420281, 9420281] + | IPO7 10527 Broad 36 + Missense SNP G [33] 11 [ 26495005, 26495005] + | ANO3 63982 Broad 36 + Missense SNP C [34] 11 [ 55629818, 55629818] + | OR8H2 390151 Broad 36 + Missense SNP G [35] 11 [116737834, 116737834] + | CEP164 22897 Broad 36 + Missense SNP C [36] 12 [ 25165980, 25165980] - | CASC1 55259 Broad 36 - Missense SNP G [37] 12 [ 25289548, 25289548] - | KRAS 3845 Broad 36 - Missense SNP C [38] 12 [ 31142142, 31142142] + | DDX11 1663 Broad 36 + Missense SNP G [39] 12 [ 42434771, 42434771] - | PUS7L 83448 Broad 36 - Missense SNP T [40] 12 [ 55006974, 55006974] - | PAN2 9924 Broad 36 - Missense SNP T [41] 13 [102199348, 102199348] - | LOC643677 643677 Broad 36 - Missense SNP G [42] 15 [ 40503502, 40503502] - | ZFP106 64397 Broad 36 - Missense SNP C [43] 15 [ 70816269, 70816269] + | BBS4 585 Broad 36 + Missense SNP A [44] 16 [ 23481033, 23481033] + | UBFD1 56061 Broad 36 + Missense SNP C [45] 17 [ 21149010, 21149010] + | MAP2K3 5606 Broad 36 + Missense SNP G [46] 17 [ 35812691, 35812691] - | TOP2A 7153 Broad 36 - Missense SNP T [47] 18 [ 16788810, 16788810] - | ROCK1 6093 Broad 36 - Missense SNP C [48] 18 [ 75322338, 75322338] + | NFATC1 4772 Broad 36 + Missense SNP G [49] 19 [ 15713495, 15713495] + | OR10H3 26532 Broad 36 + Missense SNP G [50] 19 [ 40730140, 40730140] + | TMEM147 10430 Broad 36 + Missense SNP C [51] 19 [ 52338664, 52338664] + | SAE1 10055 Broad 36 + Missense SNP G [52] 19 [ 57407795, 57407795] + | PPP2R1A 5518 Broad 36 + Missense SNP G [53] 20 [ 23298287, 23298287] + | GZF1 64412 Broad 36 + Missense SNP A [54] 20 [ 31012946, 31012946] + | EFCAB8 388795 Broad 36 + Missense SNP C [55] 20 [ 40223536, 40223536] - | PTPRT 11122 Broad 36 - Missense SNP C [56] 20 [ 54467136, 54467136] + | CASS4 57091 Broad 36 + Missense SNP G [57] 20 [ 60201983, 60201983] + | GTPBP5 26164 Broad 36 + Missense SNP C [58] 21 [ 36688774, 36688774] + | CHAF1B 8208 Broad 36 + Missense SNP C [59] 21 [ 39699770, 39699770] - | LCA5L 150082 Broad 36 - Missense SNP T [60] 22 [ 27437953, 27437953] - | CHEK2 11200 Broad 36 - Missense SNP C In the seqnames column chromosomes 8 and 14 are missing, which makes sense - there is no SNPs on those. However, when do: head(seqlengthsmut.gr), 25) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 249250621 243199373 198022430 191154276 180915260 171115067 159138663 146364022 141213431 135534747 135006516 133851895 115169878 107349540 102531392 16 17 18 19 20 21 22 90354753 81195210 78077248 59128983 63025520 48129895 51304566 All 22 chromosomes are there. How is that possible? I believe that for my data, I need to do a similar thing (I'm trying to create a multilayer circular plot in ggbio). Although I have snps for only few chromosomes, I think I need to include lengths for all of them. How do I do that? Best wishes, Agnieszka ________________________________________ From: Hervé Pagès <hpages@fhcrc.org> Sent: 02 May 2014 03:27 To: vobencha at fhcrc.org; guest at bioconductor.org; bioconductor at r-project.org; Miss Agnieszka Aleksandra Golicz Cc: GenomicRanges Maintainer Subject: Re: [devteam-bioc] GenomicRanges seqlengths problem On 05/01/2014 10:21 AM, Maintainer wrote: > Hi Sonia, Val, > > On 05/01/2014 09:52 AM, Maintainer wrote: >> Hi, >> >> The seqnames are in a different orders in the 'chrs' and 'sl' objects. >> Looking at the first 3 names in each, >> >> > head(seqlengths(chrs), 3) >> lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_10_v2 >> NA NA NA >> >> > head(sl, 3) >> lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_2_v2 >> 4258568 3378610 2939989 >> >> When a GRanges is created, seqnames are sorted in ascii order. > > Just to clarify, the seqlevels are sorted, not the seqnames: > > > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) > > gr > GRanges with 3 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] b [1, 10] * > [2] a [2, 10] * > [3] b [3, 10] * > --- > seqlengths: > a b > NA NA > > Not sorted (i.e. user-supplied order is preserved): > > > seqnames(gr) > factor-Rle of length 3 with 3 runs > Lengths: 1 1 1 > Values : b a b > Levels(2): a b > > Sorted: > > > seqlevels(gr) > [1] "a" "b" > > This sorting of the seqlevels is not good anyway (people with different > LOCALE will get different results). I just fixed this so now the > GRanges() constructor will preserve the seqlevels in the order supplied > by the user: > > > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) > > seqlevels(gr) > [1] "b" "a" > > More precisely, the seqlevels are obtained by doing unique() on the > seqnames. > > The fix will propagate to the public repos and become available thru > biocLite() in the next 24 hours. > > Then your original code should just work Sonia. Or maybe not :) You might still need to use 'stringsAsFactors=FALSE' when calling read.table(). Please let us know if that still doesn't make it. Thanks, H. > > Cheers, > H. > > >> You can >> see the full list with seqinfo(): >> >> > seqinfo(chrs) >> Seqinfo of length 34 >> seqnames seqlengths isCircular genome >> lm_SuperContig_0_v2 <na> <na> <na> >> lm_SuperContig_1_v2 <na> <na> <na> >> lm_SuperContig_10_v2 <na> <na> <na> >> lm_SuperContig_11_v2 <na> <na> <na> >> lm_SuperContig_12_v2 <na> <na> <na> >> ... ... ... ... >> >> The seqnames in the replacement vector must match the order in the >> 'Seqinfo' object. >> >> To re-order: >> >> sl_new <- sl[match(levels(factor(names(sl))), names(sl))] >> >> Then add the new lengths: >> >>>> seqlengths(chrs) <- sl_new >>>> chrs >>> GRanges with 34 ranges and 0 metadata columns: >>> seqnames ranges strand >>> <rle> <iranges> <rle> >>> [1] lm_SuperContig_0_v2 [1, 4258568] * >>> [2] lm_SuperContig_1_v2 [1, 3378610] * >>> [3] lm_SuperContig_2_v2 [1, 2939989] * >>> [4] lm_SuperContig_3_v2 [1, 2348246] * >>> [5] lm_SuperContig_4_v2 [1, 1918205] * >>> ... ... ... ... >>> [30] lm_SuperContig_29_v2 [1, 200940] * >>> [31] lm_SuperContig_30_v2 [1, 154863] * >>> [32] lm_SuperContig_31_v2 [1, 143268] * >>> [33] lm_SuperContig_32_v2 [1, 87679] * >>> [34] lm_SuperContig_34_v2 [1, 58596] * >>> --- >>> seqlengths: >>> lm_SuperContig_0_v2 lm_SuperContig_1_v2 ... lm_SuperContig_9_v2 >>> 4258568 3378610 ... 1772623 >> >> >> Valerie >> >> >> On 05/01/2014 07:20 AM, Maintainer wrote: >>> >>> Hello, >>> >>> I have a problem with supplying GRanges object with seqlengths. >>> I have a files. >>> chrs.txt - contains information about chromosomes >>> >>> chrs.txt >>> chr,start,end,len >>> lm_SuperContig_0_v2,1,4258568,4258568 >>> lm_SuperContig_1_v2,1,3378610,3378610 >>> lm_SuperContig_2_v2,1,2939989,2939989 >>> lm_SuperContig_3_v2,1,2348246,2348246 >>> lm_SuperContig_4_v2,1,1918205,1918205 >>> lm_SuperContig_6_v2,1,1888674,1888674 >>> lm_SuperContig_5_v2,1,1869450,1869450 >>> lm_SuperContig_8_v2,1,1809296,1809296 >>> lm_SuperContig_9_v2,1,1772623,1772623 >>> lm_SuperContig_7_v2,1,1769547,1769547 >>> lm_SuperContig_10_v2,1,1758670,1758670 >>> lm_SuperContig_13_v2,1,1634580,1634580 >>> lm_SuperContig_12_v2,1,1631710,1631710 >>> lm_SuperContig_11_v2,1,1590160,1590160 >>> lm_SuperContig_15_v2,1,1560629,1560629 >>> lm_SuperContig_14_v2,1,1533332,1533332 >>> lm_SuperContig_17_v2,1,1445693,1445693 >>> lm_SuperContig_16_v2,1,1397653,1397653 >>> lm_SuperContig_18_v2,1,1351976,1351976 >>> lm_SuperContig_19_v2,1,1186800,1186800 >>> lm_SuperContig_20_v2,1,1087932,1087932 >>> lm_SuperContig_21_v2,1,1020521,1020521 >>> lm_SuperContig_22_v2,1,731443,731443 >>> lm_SuperContig_23_v2,1,521426,521426 >>> lm_SuperContig_24_v2,1,475869,475869 >>> lm_SuperContig_25_v2,1,318058,318058 >>> lm_SuperContig_26_v2,1,261540,261540 >>> lm_SuperContig_27_v2,1,250629,250629 >>> lm_SuperContig_28_v2,1,236098,236098 >>> lm_SuperContig_29_v2,1,200940,200940 >>> lm_SuperContig_30_v2,1,154863,154863 >>> lm_SuperContig_31_v2,1,143268,143268 >>> lm_SuperContig_32_v2,1,87679,87679 >>> lm_SuperContig_34_v2,1,58596,58596 >>> >>> I try to do the following: >>> # creating GRanges object for chromosomes >>> dataChr <- read.table("chrs.txt",header=T,sep=",") >>> chrs <- with(dataChr, GRanges(chr, IRanges(start, end))) >>> sl <- setNames(dataChr$len, as.character(dataChr$chr)) >>> seqlengths(chrs) <- sl >>> >>> And I get the following error: >>> >>> Error in .normargSeqlengths(value, seqnames(x)) : >>> when the supplied 'seqlengths' vector is named, the names must match the seqnames >>> >>> Any chance you could help me with what is going on? >>> >>> Best wishes, >>> Agnieszka >>> >>> >>> -- output of sessionInfo(): >>> >>> R version 3.1.0 (2014-04-10) >>> Platform: i386-w64-mingw32/i386 (32-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 >>> [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] XVector_0.4.0 ggbio_1.12.3 ggplot2_0.9.3.1 GenomicRanges_1.16.2 GenomeInfoDb_1.0.2 IRanges_1.22.4 BiocGenerics_0.10.0 >>> [8] BiocInstaller_1.14.2 >>> >>> loaded via a namespace (and not attached): >>> [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6 Biobase_2.24.0 BiocParallel_0.6.0 biomaRt_2.20.0 >>> [7] Biostrings_2.32.0 biovizBase_1.12.1 bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 cluster_1.15.2 >>> [13] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 digest_0.6.4 fail_1.2 >>> [19] foreach_1.4.2 Formula_1.1-1 GenomicAlignments_1.0.0 GenomicFeatures_1.16.0 grid_3.1.0 gridExtra_0.9.1 >>> [25] gtable_0.1.2 Hmisc_3.14-4 iterators_1.0.7 labeling_0.2 lattice_0.20-29 latticeExtra_0.6-26 >>> [31] MASS_7.3-31 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.11.1 >>> [37] RCurl_1.95-4.1 reshape2_1.4 Rsamtools_1.16.0 RSQLite_0.11.4 rtracklayer_1.24.0 scales_0.2.4 >>> [43] sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 stringr_0.6.2 survival_2.37-7 tools_3.1.0 >>> [49] VariantAnnotation_1.10.0 XML_3.98-1.1 zlibbioc_1.10.0 >>> >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> __________________________________________________________________ ______ >>> devteam-bioc mailing list >>> To unsubscribe from this mailing list send a blank email to >>> devteam-bioc-leave at lists.fhcrc.org >>> You can also unsubscribe or change your personal options at >>> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc >>> >> >> > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
0
Entering edit mode
Sorry did not include my chromosome and snp files. It looks like this: chrs.txt - summary of chromosomes chr,start,end,len 0,1,4258568,4258568 1,1,3378610,3378610 2,1,2939989,2939989 3,1,2348246,2348246 4,1,1918205,1918205 6,1,1888674,1888674 5,1,1869450,1869450 8,1,1809296,1809296 9,1,1772623,1772623 7,1,1769547,1769547 10,1,1758670,1758670 13,1,1634580,1634580 12,1,1631710,1631710 11,1,1590160,1590160 15,1,1560629,1560629 14,1,1533332,1533332 17,1,1445693,1445693 16,1,1397653,1397653 18,1,1351976,1351976 19,1,1186800,1186800 20,1,1087932,1087932 21,1,1020521,1020521 22,1,731443,731443 23,1,521426,521426 24,1,475869,475869 25,1,318058,318058 26,1,261540,261540 27,1,250629,250629 28,1,236098,236098 29,1,200940,200940 30,1,154863,154863 31,1,143268,143268 32,1,87679,87679 34,1,58596,58596 snps.txt - SNP file chr,id,start,end 6,egn4_Lema_G046740.1,538805,538805 6,egn4_Lema_G049660.1,1607018,1607018 10,egn4_Lema_G076370.1,1242542,1242542 13,egn4_Lema_G081640.1,1352946,1352946 12,egn4_Lema_G082370.1,109077,109077 12,egn4_Lema_G085610.1,1190615,1190615 12,egn4_Lema_G085620.1,1190933,1190933 15,egn4_Lema_G090230.1,331669,331669 16,egn4_Lema_G104510.1,667778,667778 18,egn4_Lema_G110260.1,1321392,1321392 19,egn4_Lema_G113060.1,958993,958993 19,egn4_Lema_G113080.1,1177185,1177185 21,egn4_Lema_G116710.1,263332,263332 If I only do that: dataChr <- read.table("chrs.txt",header=T,sep=",",stringsAsFactors=FALSE) dataChrS <- dataChr[order(dataChr$chr),] t <- read.table("snps.txt",header=T,sep=",") snps <- with(t, GRanges(chr, IRanges(start, end), id=id)) seqlens <- setNames(dataChrS$len, as.character(dataChrS$chr)) seqlengths(snps) <- seqlens I get the initial error - Error in .normargSeqlengths(value, seqnames(x)) : when the supplied 'seqlengths' vector is named, the names must match the seqnames Makes sense, there are no SNPs on some chromosomes If I do that: lChrs <- dataChrS[dataChrS$chr%in%t$chr,] seqlens <- setNames(lChrs$len, as.character(lChrs$chr)) seqlengths(snps) <- seqlens seqlegths(snps) 6 10 12 13 15 16 1888674 1758670 1631710 1634580 1560629 1397653 I only have chromosomes with SNPs. Best wishes, Agnieszka ________________________________________ From: Miss Agnieszka Aleksandra Golicz Sent: 02 May 2014 10:01 To: Hervé Pagès; vobencha at fhcrc.org; guest at bioconductor.org; bioconductor at r-project.org Cc: GenomicRanges Maintainer Subject: RE: [devteam-bioc] GenomicRanges seqlengths problem Hello, Thank you very much for your help. I am facing one more problem with GRanges. I am trying to emulate this object. data("CRC", package = "biovizBase") object: mut.gr (it's SNP annotation for human chromosomes 1-22) When I just do show on mut.gr showmut.gr) GRanges with 60 ranges and 10 metadata columns: seqnames ranges strand | Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Strand Variant_Classification Variant_Type Reference_Allele <rle> <iranges> <rle> | <factor> <integer> <factor> <integer> <factor> <factor> <factor> <factor> [1] 1 [ 11003085, 11003085] + | TARDBP 23435 Broad 36 + Missense SNP G [2] 1 [ 62352395, 62352395] + | INADL 10207 Broad 36 + Missense SNP T [3] 1 [194960885, 194960885] + | CFH 3075 Broad 36 + Missense SNP G [4] 2 [ 10116508, 10116508] - | CYS1 192668 Broad 36 - Missense SNP C [5] 2 [ 33617747, 33617747] + | RASGRP3 25780 Broad 36 + Missense SNP C [6] 2 [ 73894280, 73894280] + | C2orf78 388960 Broad 36 + Missense SNP T [7] 2 [ 96732769, 96732769] + | FER1L5 90342 Broad 36 + Missense SNP T [8] 2 [179160267, 179160267] - | TTN 7273 Broad 36 - Missense SNP C [9] 2 [217251189, 217251189] - | IGFBP5 3488 Broad 36 - Missense SNP C [10] 3 [ 12620699, 12620699] - | RAF1 5894 Broad 36 - Missense SNP G [11] 3 [ 46472880, 46472880] - | LTF 4057 Broad 36 - Missense SNP C [12] 3 [137203130, 137203130] + | PPP2R3A 5523 Broad 36 + Missense SNP A [13] 3 [137457429, 137457429] + | PCCB 5096 Broad 36 + Missense SNP C [14] 3 [184708629, 184708629] - | KLHL6 89857 Broad 36 - Missense SNP G [15] 4 [147434591, 147434591] - | SLC10A7 84068 Broad 36 - Missense SNP G [16] 4 [185915412, 185915412] - | ACSL1 2180 Broad 36 - Missense SNP A [17] 5 [ 79070342, 79070342] + | CMYA5 202333 Broad 36 + Missense SNP C [18] 5 [ 94775579, 94775579] + | FAM81B 153643 Broad 36 + Missense SNP G [19] 5 [140838266, 140838266] + | PCDHGC3 5098 Broad 36 + Missense SNP T [20] 6 [109992724, 109992724] - | AKD1 221264 Broad 36 - Missense SNP C [21] 6 [118993492, 118993492] - | C6orf204 387119 Broad 36 - Missense SNP G [22] 7 [124286401, 124286401] - | POT1 25913 Broad 36 - Missense SNP A [23] 7 [125960456, 125960456] - | GRM8 2918 Broad 36 - Missense SNP A [24] 9 [ 35097658, 35097658] - | KIAA1539 80256 Broad 36 - Missense SNP A [25] 9 [103164773, 103164773] - | BAAT 570 Broad 36 - Missense SNP G [26] 9 [132745783, 132745783] + | ABL1 25 Broad 36 + Missense SNP C [27] 9 [138481305, 138481305] - | SEC16A 9919 Broad 36 - Missense SNP C [28] 10 [ 7661761, 7661761] - | ITIH5 80760 Broad 36 - Missense SNP C [29] 10 [106064683, 106064683] - | ITPRIP 85450 Broad 36 - Missense SNP T [30] 11 [ 603430, 603430] - | IRF7 3665 Broad 36 - Missense SNP G [31] 11 [ 5610148, 5610148] + | TRIM34 53840 Broad 36 + Missense SNP A [32] 11 [ 9420281, 9420281] + | IPO7 10527 Broad 36 + Missense SNP G [33] 11 [ 26495005, 26495005] + | ANO3 63982 Broad 36 + Missense SNP C [34] 11 [ 55629818, 55629818] + | OR8H2 390151 Broad 36 + Missense SNP G [35] 11 [116737834, 116737834] + | CEP164 22897 Broad 36 + Missense SNP C [36] 12 [ 25165980, 25165980] - | CASC1 55259 Broad 36 - Missense SNP G [37] 12 [ 25289548, 25289548] - | KRAS 3845 Broad 36 - Missense SNP C [38] 12 [ 31142142, 31142142] + | DDX11 1663 Broad 36 + Missense SNP G [39] 12 [ 42434771, 42434771] - | PUS7L 83448 Broad 36 - Missense SNP T [40] 12 [ 55006974, 55006974] - | PAN2 9924 Broad 36 - Missense SNP T [41] 13 [102199348, 102199348] - | LOC643677 643677 Broad 36 - Missense SNP G [42] 15 [ 40503502, 40503502] - | ZFP106 64397 Broad 36 - Missense SNP C [43] 15 [ 70816269, 70816269] + | BBS4 585 Broad 36 + Missense SNP A [44] 16 [ 23481033, 23481033] + | UBFD1 56061 Broad 36 + Missense SNP C [45] 17 [ 21149010, 21149010] + | MAP2K3 5606 Broad 36 + Missense SNP G [46] 17 [ 35812691, 35812691] - | TOP2A 7153 Broad 36 - Missense SNP T [47] 18 [ 16788810, 16788810] - | ROCK1 6093 Broad 36 - Missense SNP C [48] 18 [ 75322338, 75322338] + | NFATC1 4772 Broad 36 + Missense SNP G [49] 19 [ 15713495, 15713495] + | OR10H3 26532 Broad 36 + Missense SNP G [50] 19 [ 40730140, 40730140] + | TMEM147 10430 Broad 36 + Missense SNP C [51] 19 [ 52338664, 52338664] + | SAE1 10055 Broad 36 + Missense SNP G [52] 19 [ 57407795, 57407795] + | PPP2R1A 5518 Broad 36 + Missense SNP G [53] 20 [ 23298287, 23298287] + | GZF1 64412 Broad 36 + Missense SNP A [54] 20 [ 31012946, 31012946] + | EFCAB8 388795 Broad 36 + Missense SNP C [55] 20 [ 40223536, 40223536] - | PTPRT 11122 Broad 36 - Missense SNP C [56] 20 [ 54467136, 54467136] + | CASS4 57091 Broad 36 + Missense SNP G [57] 20 [ 60201983, 60201983] + | GTPBP5 26164 Broad 36 + Missense SNP C [58] 21 [ 36688774, 36688774] + | CHAF1B 8208 Broad 36 + Missense SNP C [59] 21 [ 39699770, 39699770] - | LCA5L 150082 Broad 36 - Missense SNP T [60] 22 [ 27437953, 27437953] - | CHEK2 11200 Broad 36 - Missense SNP C In the seqnames column chromosomes 8 and 14 are missing, which makes sense - there is no SNPs on those. However, when do: head(seqlengthsmut.gr), 25) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 249250621 243199373 198022430 191154276 180915260 171115067 159138663 146364022 141213431 135534747 135006516 133851895 115169878 107349540 102531392 16 17 18 19 20 21 22 90354753 81195210 78077248 59128983 63025520 48129895 51304566 All 22 chromosomes are there. How is that possible? I believe that for my data, I need to do a similar thing (I'm trying to create a multilayer circular plot in ggbio). Although I have snps for only few chromosomes, I think I need to include lengths for all of them. How do I do that? Best wishes, Agnieszka ________________________________________ From: Hervé Pagès <hpages@fhcrc.org> Sent: 02 May 2014 03:27 To: vobencha at fhcrc.org; guest at bioconductor.org; bioconductor at r-project.org; Miss Agnieszka Aleksandra Golicz Cc: GenomicRanges Maintainer Subject: Re: [devteam-bioc] GenomicRanges seqlengths problem On 05/01/2014 10:21 AM, Maintainer wrote: > Hi Sonia, Val, > > On 05/01/2014 09:52 AM, Maintainer wrote: >> Hi, >> >> The seqnames are in a different orders in the 'chrs' and 'sl' objects. >> Looking at the first 3 names in each, >> >> > head(seqlengths(chrs), 3) >> lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_10_v2 >> NA NA NA >> >> > head(sl, 3) >> lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_2_v2 >> 4258568 3378610 2939989 >> >> When a GRanges is created, seqnames are sorted in ascii order. > > Just to clarify, the seqlevels are sorted, not the seqnames: > > > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) > > gr > GRanges with 3 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] b [1, 10] * > [2] a [2, 10] * > [3] b [3, 10] * > --- > seqlengths: > a b > NA NA > > Not sorted (i.e. user-supplied order is preserved): > > > seqnames(gr) > factor-Rle of length 3 with 3 runs > Lengths: 1 1 1 > Values : b a b > Levels(2): a b > > Sorted: > > > seqlevels(gr) > [1] "a" "b" > > This sorting of the seqlevels is not good anyway (people with different > LOCALE will get different results). I just fixed this so now the > GRanges() constructor will preserve the seqlevels in the order supplied > by the user: > > > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) > > seqlevels(gr) > [1] "b" "a" > > More precisely, the seqlevels are obtained by doing unique() on the > seqnames. > > The fix will propagate to the public repos and become available thru > biocLite() in the next 24 hours. > > Then your original code should just work Sonia. Or maybe not :) You might still need to use 'stringsAsFactors=FALSE' when calling read.table(). Please let us know if that still doesn't make it. Thanks, H. > > Cheers, > H. > > >> You can >> see the full list with seqinfo(): >> >> > seqinfo(chrs) >> Seqinfo of length 34 >> seqnames seqlengths isCircular genome >> lm_SuperContig_0_v2 <na> <na> <na> >> lm_SuperContig_1_v2 <na> <na> <na> >> lm_SuperContig_10_v2 <na> <na> <na> >> lm_SuperContig_11_v2 <na> <na> <na> >> lm_SuperContig_12_v2 <na> <na> <na> >> ... ... ... ... >> >> The seqnames in the replacement vector must match the order in the >> 'Seqinfo' object. >> >> To re-order: >> >> sl_new <- sl[match(levels(factor(names(sl))), names(sl))] >> >> Then add the new lengths: >> >>>> seqlengths(chrs) <- sl_new >>>> chrs >>> GRanges with 34 ranges and 0 metadata columns: >>> seqnames ranges strand >>> <rle> <iranges> <rle> >>> [1] lm_SuperContig_0_v2 [1, 4258568] * >>> [2] lm_SuperContig_1_v2 [1, 3378610] * >>> [3] lm_SuperContig_2_v2 [1, 2939989] * >>> [4] lm_SuperContig_3_v2 [1, 2348246] * >>> [5] lm_SuperContig_4_v2 [1, 1918205] * >>> ... ... ... ... >>> [30] lm_SuperContig_29_v2 [1, 200940] * >>> [31] lm_SuperContig_30_v2 [1, 154863] * >>> [32] lm_SuperContig_31_v2 [1, 143268] * >>> [33] lm_SuperContig_32_v2 [1, 87679] * >>> [34] lm_SuperContig_34_v2 [1, 58596] * >>> --- >>> seqlengths: >>> lm_SuperContig_0_v2 lm_SuperContig_1_v2 ... lm_SuperContig_9_v2 >>> 4258568 3378610 ... 1772623 >> >> >> Valerie >> >> >> On 05/01/2014 07:20 AM, Maintainer wrote: >>> >>> Hello, >>> >>> I have a problem with supplying GRanges object with seqlengths. >>> I have a files. >>> chrs.txt - contains information about chromosomes >>> >>> chrs.txt >>> chr,start,end,len >>> lm_SuperContig_0_v2,1,4258568,4258568 >>> lm_SuperContig_1_v2,1,3378610,3378610 >>> lm_SuperContig_2_v2,1,2939989,2939989 >>> lm_SuperContig_3_v2,1,2348246,2348246 >>> lm_SuperContig_4_v2,1,1918205,1918205 >>> lm_SuperContig_6_v2,1,1888674,1888674 >>> lm_SuperContig_5_v2,1,1869450,1869450 >>> lm_SuperContig_8_v2,1,1809296,1809296 >>> lm_SuperContig_9_v2,1,1772623,1772623 >>> lm_SuperContig_7_v2,1,1769547,1769547 >>> lm_SuperContig_10_v2,1,1758670,1758670 >>> lm_SuperContig_13_v2,1,1634580,1634580 >>> lm_SuperContig_12_v2,1,1631710,1631710 >>> lm_SuperContig_11_v2,1,1590160,1590160 >>> lm_SuperContig_15_v2,1,1560629,1560629 >>> lm_SuperContig_14_v2,1,1533332,1533332 >>> lm_SuperContig_17_v2,1,1445693,1445693 >>> lm_SuperContig_16_v2,1,1397653,1397653 >>> lm_SuperContig_18_v2,1,1351976,1351976 >>> lm_SuperContig_19_v2,1,1186800,1186800 >>> lm_SuperContig_20_v2,1,1087932,1087932 >>> lm_SuperContig_21_v2,1,1020521,1020521 >>> lm_SuperContig_22_v2,1,731443,731443 >>> lm_SuperContig_23_v2,1,521426,521426 >>> lm_SuperContig_24_v2,1,475869,475869 >>> lm_SuperContig_25_v2,1,318058,318058 >>> lm_SuperContig_26_v2,1,261540,261540 >>> lm_SuperContig_27_v2,1,250629,250629 >>> lm_SuperContig_28_v2,1,236098,236098 >>> lm_SuperContig_29_v2,1,200940,200940 >>> lm_SuperContig_30_v2,1,154863,154863 >>> lm_SuperContig_31_v2,1,143268,143268 >>> lm_SuperContig_32_v2,1,87679,87679 >>> lm_SuperContig_34_v2,1,58596,58596 >>> >>> I try to do the following: >>> # creating GRanges object for chromosomes >>> dataChr <- read.table("chrs.txt",header=T,sep=",") >>> chrs <- with(dataChr, GRanges(chr, IRanges(start, end))) >>> sl <- setNames(dataChr$len, as.character(dataChr$chr)) >>> seqlengths(chrs) <- sl >>> >>> And I get the following error: >>> >>> Error in .normargSeqlengths(value, seqnames(x)) : >>> when the supplied 'seqlengths' vector is named, the names must match the seqnames >>> >>> Any chance you could help me with what is going on? >>> >>> Best wishes, >>> Agnieszka >>> >>> >>> -- output of sessionInfo(): >>> >>> R version 3.1.0 (2014-04-10) >>> Platform: i386-w64-mingw32/i386 (32-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 >>> [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] XVector_0.4.0 ggbio_1.12.3 ggplot2_0.9.3.1 GenomicRanges_1.16.2 GenomeInfoDb_1.0.2 IRanges_1.22.4 BiocGenerics_0.10.0 >>> [8] BiocInstaller_1.14.2 >>> >>> loaded via a namespace (and not attached): >>> [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6 Biobase_2.24.0 BiocParallel_0.6.0 biomaRt_2.20.0 >>> [7] Biostrings_2.32.0 biovizBase_1.12.1 bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 cluster_1.15.2 >>> [13] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 digest_0.6.4 fail_1.2 >>> [19] foreach_1.4.2 Formula_1.1-1 GenomicAlignments_1.0.0 GenomicFeatures_1.16.0 grid_3.1.0 gridExtra_0.9.1 >>> [25] gtable_0.1.2 Hmisc_3.14-4 iterators_1.0.7 labeling_0.2 lattice_0.20-29 latticeExtra_0.6-26 >>> [31] MASS_7.3-31 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.11.1 >>> [37] RCurl_1.95-4.1 reshape2_1.4 Rsamtools_1.16.0 RSQLite_0.11.4 rtracklayer_1.24.0 scales_0.2.4 >>> [43] sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 stringr_0.6.2 survival_2.37-7 tools_3.1.0 >>> [49] VariantAnnotation_1.10.0 XML_3.98-1.1 zlibbioc_1.10.0 >>> >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> __________________________________________________________________ ______ >>> devteam-bioc mailing list >>> To unsubscribe from this mailing list send a blank email to >>> devteam-bioc-leave at lists.fhcrc.org >>> You can also unsubscribe or change your personal options at >>> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc >>> >> >> > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
0
Entering edit mode
Should I post it as a separate ggbio related questio? Best wishes, Agnieszka ________________________________________ From: Miss Agnieszka Aleksandra Golicz Sent: 02 May 2014 10:28 To: Hervé Pagès; vobencha at fhcrc.org; guest at bioconductor.org; bioconductor at r-project.org Cc: GenomicRanges Maintainer Subject: RE: [devteam-bioc] GenomicRanges seqlengths problem Sorry did not include my chromosome and snp files. It looks like this: chrs.txt - summary of chromosomes chr,start,end,len 0,1,4258568,4258568 1,1,3378610,3378610 2,1,2939989,2939989 3,1,2348246,2348246 4,1,1918205,1918205 6,1,1888674,1888674 5,1,1869450,1869450 8,1,1809296,1809296 9,1,1772623,1772623 7,1,1769547,1769547 10,1,1758670,1758670 13,1,1634580,1634580 12,1,1631710,1631710 11,1,1590160,1590160 15,1,1560629,1560629 14,1,1533332,1533332 17,1,1445693,1445693 16,1,1397653,1397653 18,1,1351976,1351976 19,1,1186800,1186800 20,1,1087932,1087932 21,1,1020521,1020521 22,1,731443,731443 23,1,521426,521426 24,1,475869,475869 25,1,318058,318058 26,1,261540,261540 27,1,250629,250629 28,1,236098,236098 29,1,200940,200940 30,1,154863,154863 31,1,143268,143268 32,1,87679,87679 34,1,58596,58596 snps.txt - SNP file chr,id,start,end 6,egn4_Lema_G046740.1,538805,538805 6,egn4_Lema_G049660.1,1607018,1607018 10,egn4_Lema_G076370.1,1242542,1242542 13,egn4_Lema_G081640.1,1352946,1352946 12,egn4_Lema_G082370.1,109077,109077 12,egn4_Lema_G085610.1,1190615,1190615 12,egn4_Lema_G085620.1,1190933,1190933 15,egn4_Lema_G090230.1,331669,331669 16,egn4_Lema_G104510.1,667778,667778 18,egn4_Lema_G110260.1,1321392,1321392 19,egn4_Lema_G113060.1,958993,958993 19,egn4_Lema_G113080.1,1177185,1177185 21,egn4_Lema_G116710.1,263332,263332 If I only do that: dataChr <- read.table("chrs.txt",header=T,sep=",",stringsAsFactors=FALSE) dataChrS <- dataChr[order(dataChr$chr),] t <- read.table("snps.txt",header=T,sep=",") snps <- with(t, GRanges(chr, IRanges(start, end), id=id)) seqlens <- setNames(dataChrS$len, as.character(dataChrS$chr)) seqlengths(snps) <- seqlens I get the initial error - Error in .normargSeqlengths(value, seqnames(x)) : when the supplied 'seqlengths' vector is named, the names must match the seqnames Makes sense, there are no SNPs on some chromosomes If I do that: lChrs <- dataChrS[dataChrS$chr%in%t$chr,] seqlens <- setNames(lChrs$len, as.character(lChrs$chr)) seqlengths(snps) <- seqlens seqlegths(snps) 6 10 12 13 15 16 1888674 1758670 1631710 1634580 1560629 1397653 I only have chromosomes with SNPs. Best wishes, Agnieszka ________________________________________ From: Miss Agnieszka Aleksandra Golicz Sent: 02 May 2014 10:01 To: Hervé Pagès; vobencha at fhcrc.org; guest at bioconductor.org; bioconductor at r-project.org Cc: GenomicRanges Maintainer Subject: RE: [devteam-bioc] GenomicRanges seqlengths problem Hello, Thank you very much for your help. I am facing one more problem with GRanges. I am trying to emulate this object. data("CRC", package = "biovizBase") object: mut.gr (it's SNP annotation for human chromosomes 1-22) When I just do show on mut.gr showmut.gr) GRanges with 60 ranges and 10 metadata columns: seqnames ranges strand | Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Strand Variant_Classification Variant_Type Reference_Allele <rle> <iranges> <rle> | <factor> <integer> <factor> <integer> <factor> <factor> <factor> <factor> [1] 1 [ 11003085, 11003085] + | TARDBP 23435 Broad 36 + Missense SNP G [2] 1 [ 62352395, 62352395] + | INADL 10207 Broad 36 + Missense SNP T [3] 1 [194960885, 194960885] + | CFH 3075 Broad 36 + Missense SNP G [4] 2 [ 10116508, 10116508] - | CYS1 192668 Broad 36 - Missense SNP C [5] 2 [ 33617747, 33617747] + | RASGRP3 25780 Broad 36 + Missense SNP C [6] 2 [ 73894280, 73894280] + | C2orf78 388960 Broad 36 + Missense SNP T [7] 2 [ 96732769, 96732769] + | FER1L5 90342 Broad 36 + Missense SNP T [8] 2 [179160267, 179160267] - | TTN 7273 Broad 36 - Missense SNP C [9] 2 [217251189, 217251189] - | IGFBP5 3488 Broad 36 - Missense SNP C [10] 3 [ 12620699, 12620699] - | RAF1 5894 Broad 36 - Missense SNP G [11] 3 [ 46472880, 46472880] - | LTF 4057 Broad 36 - Missense SNP C [12] 3 [137203130, 137203130] + | PPP2R3A 5523 Broad 36 + Missense SNP A [13] 3 [137457429, 137457429] + | PCCB 5096 Broad 36 + Missense SNP C [14] 3 [184708629, 184708629] - | KLHL6 89857 Broad 36 - Missense SNP G [15] 4 [147434591, 147434591] - | SLC10A7 84068 Broad 36 - Missense SNP G [16] 4 [185915412, 185915412] - | ACSL1 2180 Broad 36 - Missense SNP A [17] 5 [ 79070342, 79070342] + | CMYA5 202333 Broad 36 + Missense SNP C [18] 5 [ 94775579, 94775579] + | FAM81B 153643 Broad 36 + Missense SNP G [19] 5 [140838266, 140838266] + | PCDHGC3 5098 Broad 36 + Missense SNP T [20] 6 [109992724, 109992724] - | AKD1 221264 Broad 36 - Missense SNP C [21] 6 [118993492, 118993492] - | C6orf204 387119 Broad 36 - Missense SNP G [22] 7 [124286401, 124286401] - | POT1 25913 Broad 36 - Missense SNP A [23] 7 [125960456, 125960456] - | GRM8 2918 Broad 36 - Missense SNP A [24] 9 [ 35097658, 35097658] - | KIAA1539 80256 Broad 36 - Missense SNP A [25] 9 [103164773, 103164773] - | BAAT 570 Broad 36 - Missense SNP G [26] 9 [132745783, 132745783] + | ABL1 25 Broad 36 + Missense SNP C [27] 9 [138481305, 138481305] - | SEC16A 9919 Broad 36 - Missense SNP C [28] 10 [ 7661761, 7661761] - | ITIH5 80760 Broad 36 - Missense SNP C [29] 10 [106064683, 106064683] - | ITPRIP 85450 Broad 36 - Missense SNP T [30] 11 [ 603430, 603430] - | IRF7 3665 Broad 36 - Missense SNP G [31] 11 [ 5610148, 5610148] + | TRIM34 53840 Broad 36 + Missense SNP A [32] 11 [ 9420281, 9420281] + | IPO7 10527 Broad 36 + Missense SNP G [33] 11 [ 26495005, 26495005] + | ANO3 63982 Broad 36 + Missense SNP C [34] 11 [ 55629818, 55629818] + | OR8H2 390151 Broad 36 + Missense SNP G [35] 11 [116737834, 116737834] + | CEP164 22897 Broad 36 + Missense SNP C [36] 12 [ 25165980, 25165980] - | CASC1 55259 Broad 36 - Missense SNP G [37] 12 [ 25289548, 25289548] - | KRAS 3845 Broad 36 - Missense SNP C [38] 12 [ 31142142, 31142142] + | DDX11 1663 Broad 36 + Missense SNP G [39] 12 [ 42434771, 42434771] - | PUS7L 83448 Broad 36 - Missense SNP T [40] 12 [ 55006974, 55006974] - | PAN2 9924 Broad 36 - Missense SNP T [41] 13 [102199348, 102199348] - | LOC643677 643677 Broad 36 - Missense SNP G [42] 15 [ 40503502, 40503502] - | ZFP106 64397 Broad 36 - Missense SNP C [43] 15 [ 70816269, 70816269] + | BBS4 585 Broad 36 + Missense SNP A [44] 16 [ 23481033, 23481033] + | UBFD1 56061 Broad 36 + Missense SNP C [45] 17 [ 21149010, 21149010] + | MAP2K3 5606 Broad 36 + Missense SNP G [46] 17 [ 35812691, 35812691] - | TOP2A 7153 Broad 36 - Missense SNP T [47] 18 [ 16788810, 16788810] - | ROCK1 6093 Broad 36 - Missense SNP C [48] 18 [ 75322338, 75322338] + | NFATC1 4772 Broad 36 + Missense SNP G [49] 19 [ 15713495, 15713495] + | OR10H3 26532 Broad 36 + Missense SNP G [50] 19 [ 40730140, 40730140] + | TMEM147 10430 Broad 36 + Missense SNP C [51] 19 [ 52338664, 52338664] + | SAE1 10055 Broad 36 + Missense SNP G [52] 19 [ 57407795, 57407795] + | PPP2R1A 5518 Broad 36 + Missense SNP G [53] 20 [ 23298287, 23298287] + | GZF1 64412 Broad 36 + Missense SNP A [54] 20 [ 31012946, 31012946] + | EFCAB8 388795 Broad 36 + Missense SNP C [55] 20 [ 40223536, 40223536] - | PTPRT 11122 Broad 36 - Missense SNP C [56] 20 [ 54467136, 54467136] + | CASS4 57091 Broad 36 + Missense SNP G [57] 20 [ 60201983, 60201983] + | GTPBP5 26164 Broad 36 + Missense SNP C [58] 21 [ 36688774, 36688774] + | CHAF1B 8208 Broad 36 + Missense SNP C [59] 21 [ 39699770, 39699770] - | LCA5L 150082 Broad 36 - Missense SNP T [60] 22 [ 27437953, 27437953] - | CHEK2 11200 Broad 36 - Missense SNP C In the seqnames column chromosomes 8 and 14 are missing, which makes sense - there is no SNPs on those. However, when do: head(seqlengthsmut.gr), 25) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 249250621 243199373 198022430 191154276 180915260 171115067 159138663 146364022 141213431 135534747 135006516 133851895 115169878 107349540 102531392 16 17 18 19 20 21 22 90354753 81195210 78077248 59128983 63025520 48129895 51304566 All 22 chromosomes are there. How is that possible? I believe that for my data, I need to do a similar thing (I'm trying to create a multilayer circular plot in ggbio). Although I have snps for only few chromosomes, I think I need to include lengths for all of them. How do I do that? Best wishes, Agnieszka ________________________________________ From: Hervé Pagès <hpages@fhcrc.org> Sent: 02 May 2014 03:27 To: vobencha at fhcrc.org; guest at bioconductor.org; bioconductor at r-project.org; Miss Agnieszka Aleksandra Golicz Cc: GenomicRanges Maintainer Subject: Re: [devteam-bioc] GenomicRanges seqlengths problem On 05/01/2014 10:21 AM, Maintainer wrote: > Hi Sonia, Val, > > On 05/01/2014 09:52 AM, Maintainer wrote: >> Hi, >> >> The seqnames are in a different orders in the 'chrs' and 'sl' objects. >> Looking at the first 3 names in each, >> >> > head(seqlengths(chrs), 3) >> lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_10_v2 >> NA NA NA >> >> > head(sl, 3) >> lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_2_v2 >> 4258568 3378610 2939989 >> >> When a GRanges is created, seqnames are sorted in ascii order. > > Just to clarify, the seqlevels are sorted, not the seqnames: > > > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) > > gr > GRanges with 3 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] b [1, 10] * > [2] a [2, 10] * > [3] b [3, 10] * > --- > seqlengths: > a b > NA NA > > Not sorted (i.e. user-supplied order is preserved): > > > seqnames(gr) > factor-Rle of length 3 with 3 runs > Lengths: 1 1 1 > Values : b a b > Levels(2): a b > > Sorted: > > > seqlevels(gr) > [1] "a" "b" > > This sorting of the seqlevels is not good anyway (people with different > LOCALE will get different results). I just fixed this so now the > GRanges() constructor will preserve the seqlevels in the order supplied > by the user: > > > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) > > seqlevels(gr) > [1] "b" "a" > > More precisely, the seqlevels are obtained by doing unique() on the > seqnames. > > The fix will propagate to the public repos and become available thru > biocLite() in the next 24 hours. > > Then your original code should just work Sonia. Or maybe not :) You might still need to use 'stringsAsFactors=FALSE' when calling read.table(). Please let us know if that still doesn't make it. Thanks, H. > > Cheers, > H. > > >> You can >> see the full list with seqinfo(): >> >> > seqinfo(chrs) >> Seqinfo of length 34 >> seqnames seqlengths isCircular genome >> lm_SuperContig_0_v2 <na> <na> <na> >> lm_SuperContig_1_v2 <na> <na> <na> >> lm_SuperContig_10_v2 <na> <na> <na> >> lm_SuperContig_11_v2 <na> <na> <na> >> lm_SuperContig_12_v2 <na> <na> <na> >> ... ... ... ... >> >> The seqnames in the replacement vector must match the order in the >> 'Seqinfo' object. >> >> To re-order: >> >> sl_new <- sl[match(levels(factor(names(sl))), names(sl))] >> >> Then add the new lengths: >> >>>> seqlengths(chrs) <- sl_new >>>> chrs >>> GRanges with 34 ranges and 0 metadata columns: >>> seqnames ranges strand >>> <rle> <iranges> <rle> >>> [1] lm_SuperContig_0_v2 [1, 4258568] * >>> [2] lm_SuperContig_1_v2 [1, 3378610] * >>> [3] lm_SuperContig_2_v2 [1, 2939989] * >>> [4] lm_SuperContig_3_v2 [1, 2348246] * >>> [5] lm_SuperContig_4_v2 [1, 1918205] * >>> ... ... ... ... >>> [30] lm_SuperContig_29_v2 [1, 200940] * >>> [31] lm_SuperContig_30_v2 [1, 154863] * >>> [32] lm_SuperContig_31_v2 [1, 143268] * >>> [33] lm_SuperContig_32_v2 [1, 87679] * >>> [34] lm_SuperContig_34_v2 [1, 58596] * >>> --- >>> seqlengths: >>> lm_SuperContig_0_v2 lm_SuperContig_1_v2 ... lm_SuperContig_9_v2 >>> 4258568 3378610 ... 1772623 >> >> >> Valerie >> >> >> On 05/01/2014 07:20 AM, Maintainer wrote: >>> >>> Hello, >>> >>> I have a problem with supplying GRanges object with seqlengths. >>> I have a files. >>> chrs.txt - contains information about chromosomes >>> >>> chrs.txt >>> chr,start,end,len >>> lm_SuperContig_0_v2,1,4258568,4258568 >>> lm_SuperContig_1_v2,1,3378610,3378610 >>> lm_SuperContig_2_v2,1,2939989,2939989 >>> lm_SuperContig_3_v2,1,2348246,2348246 >>> lm_SuperContig_4_v2,1,1918205,1918205 >>> lm_SuperContig_6_v2,1,1888674,1888674 >>> lm_SuperContig_5_v2,1,1869450,1869450 >>> lm_SuperContig_8_v2,1,1809296,1809296 >>> lm_SuperContig_9_v2,1,1772623,1772623 >>> lm_SuperContig_7_v2,1,1769547,1769547 >>> lm_SuperContig_10_v2,1,1758670,1758670 >>> lm_SuperContig_13_v2,1,1634580,1634580 >>> lm_SuperContig_12_v2,1,1631710,1631710 >>> lm_SuperContig_11_v2,1,1590160,1590160 >>> lm_SuperContig_15_v2,1,1560629,1560629 >>> lm_SuperContig_14_v2,1,1533332,1533332 >>> lm_SuperContig_17_v2,1,1445693,1445693 >>> lm_SuperContig_16_v2,1,1397653,1397653 >>> lm_SuperContig_18_v2,1,1351976,1351976 >>> lm_SuperContig_19_v2,1,1186800,1186800 >>> lm_SuperContig_20_v2,1,1087932,1087932 >>> lm_SuperContig_21_v2,1,1020521,1020521 >>> lm_SuperContig_22_v2,1,731443,731443 >>> lm_SuperContig_23_v2,1,521426,521426 >>> lm_SuperContig_24_v2,1,475869,475869 >>> lm_SuperContig_25_v2,1,318058,318058 >>> lm_SuperContig_26_v2,1,261540,261540 >>> lm_SuperContig_27_v2,1,250629,250629 >>> lm_SuperContig_28_v2,1,236098,236098 >>> lm_SuperContig_29_v2,1,200940,200940 >>> lm_SuperContig_30_v2,1,154863,154863 >>> lm_SuperContig_31_v2,1,143268,143268 >>> lm_SuperContig_32_v2,1,87679,87679 >>> lm_SuperContig_34_v2,1,58596,58596 >>> >>> I try to do the following: >>> # creating GRanges object for chromosomes >>> dataChr <- read.table("chrs.txt",header=T,sep=",") >>> chrs <- with(dataChr, GRanges(chr, IRanges(start, end))) >>> sl <- setNames(dataChr$len, as.character(dataChr$chr)) >>> seqlengths(chrs) <- sl >>> >>> And I get the following error: >>> >>> Error in .normargSeqlengths(value, seqnames(x)) : >>> when the supplied 'seqlengths' vector is named, the names must match the seqnames >>> >>> Any chance you could help me with what is going on? >>> >>> Best wishes, >>> Agnieszka >>> >>> >>> -- output of sessionInfo(): >>> >>> R version 3.1.0 (2014-04-10) >>> Platform: i386-w64-mingw32/i386 (32-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 >>> [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] XVector_0.4.0 ggbio_1.12.3 ggplot2_0.9.3.1 GenomicRanges_1.16.2 GenomeInfoDb_1.0.2 IRanges_1.22.4 BiocGenerics_0.10.0 >>> [8] BiocInstaller_1.14.2 >>> >>> loaded via a namespace (and not attached): >>> [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6 Biobase_2.24.0 BiocParallel_0.6.0 biomaRt_2.20.0 >>> [7] Biostrings_2.32.0 biovizBase_1.12.1 bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 cluster_1.15.2 >>> [13] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 digest_0.6.4 fail_1.2 >>> [19] foreach_1.4.2 Formula_1.1-1 GenomicAlignments_1.0.0 GenomicFeatures_1.16.0 grid_3.1.0 gridExtra_0.9.1 >>> [25] gtable_0.1.2 Hmisc_3.14-4 iterators_1.0.7 labeling_0.2 lattice_0.20-29 latticeExtra_0.6-26 >>> [31] MASS_7.3-31 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.11.1 >>> [37] RCurl_1.95-4.1 reshape2_1.4 Rsamtools_1.16.0 RSQLite_0.11.4 rtracklayer_1.24.0 scales_0.2.4 >>> [43] sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 stringr_0.6.2 survival_2.37-7 tools_3.1.0 >>> [49] VariantAnnotation_1.10.0 XML_3.98-1.1 zlibbioc_1.10.0 >>> >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> __________________________________________________________________ ______ >>> devteam-bioc mailing list >>> To unsubscribe from this mailing list send a blank email to >>> devteam-bioc-leave at lists.fhcrc.org >>> You can also unsubscribe or change your personal options at >>> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc >>> >> >> > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
0
Entering edit mode
Hi Agnieszka, it's not really a ggbio related question, but would you mind try seqlevels(snps) <- names(seqlens) # then seqlengths(snps) <- seqlens to see if that solved your problem. cheers Tengfei On Sat, May 3, 2014 at 6:32 PM, Miss Agnieszka Aleksandra Golicz < agnieszka.golicz@uq.net.au> wrote: > Should I post it as a separate ggbio related questio? > > Best wishes, > Agnieszka > ________________________________________ > From: Miss Agnieszka Aleksandra Golicz > Sent: 02 May 2014 10:28 > To: Hervé Pagès; vobencha@fhcrc.org; guest@bioconductor.org; > bioconductor@r-project.org > Cc: GenomicRanges Maintainer > Subject: RE: [devteam-bioc] GenomicRanges seqlengths problem > > Sorry did not include my chromosome and snp files. > > It looks like this: > chrs.txt - summary of chromosomes > chr,start,end,len > 0,1,4258568,4258568 > 1,1,3378610,3378610 > 2,1,2939989,2939989 > 3,1,2348246,2348246 > 4,1,1918205,1918205 > 6,1,1888674,1888674 > 5,1,1869450,1869450 > 8,1,1809296,1809296 > 9,1,1772623,1772623 > 7,1,1769547,1769547 > 10,1,1758670,1758670 > 13,1,1634580,1634580 > 12,1,1631710,1631710 > 11,1,1590160,1590160 > 15,1,1560629,1560629 > 14,1,1533332,1533332 > 17,1,1445693,1445693 > 16,1,1397653,1397653 > 18,1,1351976,1351976 > 19,1,1186800,1186800 > 20,1,1087932,1087932 > 21,1,1020521,1020521 > 22,1,731443,731443 > 23,1,521426,521426 > 24,1,475869,475869 > 25,1,318058,318058 > 26,1,261540,261540 > 27,1,250629,250629 > 28,1,236098,236098 > 29,1,200940,200940 > 30,1,154863,154863 > 31,1,143268,143268 > 32,1,87679,87679 > 34,1,58596,58596 > > snps.txt - SNP file > chr,id,start,end > 6,egn4_Lema_G046740.1,538805,538805 > 6,egn4_Lema_G049660.1,1607018,1607018 > 10,egn4_Lema_G076370.1,1242542,1242542 > 13,egn4_Lema_G081640.1,1352946,1352946 > 12,egn4_Lema_G082370.1,109077,109077 > 12,egn4_Lema_G085610.1,1190615,1190615 > 12,egn4_Lema_G085620.1,1190933,1190933 > 15,egn4_Lema_G090230.1,331669,331669 > 16,egn4_Lema_G104510.1,667778,667778 > 18,egn4_Lema_G110260.1,1321392,1321392 > 19,egn4_Lema_G113060.1,958993,958993 > 19,egn4_Lema_G113080.1,1177185,1177185 > 21,egn4_Lema_G116710.1,263332,263332 > > If I only do that: > dataChr <- read.table("chrs.txt",header=T,sep=",",stringsAsFactors=FALSE) > dataChrS <- dataChr[order(dataChr$chr),] > t <- read.table("snps.txt",header=T,sep=",") > snps <- with(t, GRanges(chr, IRanges(start, end), id=id)) > seqlens <- setNames(dataChrS$len, as.character(dataChrS$chr)) > seqlengths(snps) <- seqlens > I get the initial error - > Error in .normargSeqlengths(value, seqnames(x)) : > when the supplied 'seqlengths' vector is named, the names must match the > seqnames > Makes sense, there are no SNPs on some chromosomes > > If I do that: > lChrs <- dataChrS[dataChrS$chr%in%t$chr,] > seqlens <- setNames(lChrs$len, as.character(lChrs$chr)) > seqlengths(snps) <- seqlens > seqlegths(snps) > 6 10 12 13 15 16 > 1888674 1758670 1631710 1634580 1560629 1397653 > > I only have chromosomes with SNPs. > > Best wishes, > Agnieszka > > ________________________________________ > From: Miss Agnieszka Aleksandra Golicz > Sent: 02 May 2014 10:01 > To: Hervé Pagès; vobencha@fhcrc.org; guest@bioconductor.org; > bioconductor@r-project.org > Cc: GenomicRanges Maintainer > Subject: RE: [devteam-bioc] GenomicRanges seqlengths problem > > Hello, > > Thank you very much for your help. > > I am facing one more problem with GRanges. > I am trying to emulate this object. > data("CRC", package = "biovizBase") > object: mut.gr (it's SNP annotation for human chromosomes 1-22) > > When I just do show on mut.gr > showmut.gr) > GRanges with 60 ranges and 10 metadata columns: > seqnames ranges strand | Hugo_Symbol Entrez_Gene_Id > Center NCBI_Build Strand Variant_Classification Variant_Type > Reference_Allele > <rle> <iranges> <rle> | <factor> <integer> > <factor> <integer> <factor> <factor> <factor> > <factor> > [1] 1 [ 11003085, 11003085] + | TARDBP 23435 > Broad 36 + Missense SNP > G > [2] 1 [ 62352395, 62352395] + | INADL 10207 > Broad 36 + Missense SNP > T > [3] 1 [194960885, 194960885] + | CFH 3075 > Broad 36 + Missense SNP > G > [4] 2 [ 10116508, 10116508] - | CYS1 192668 > Broad 36 - Missense SNP > C > [5] 2 [ 33617747, 33617747] + | RASGRP3 25780 > Broad 36 + Missense SNP > C > [6] 2 [ 73894280, 73894280] + | C2orf78 388960 > Broad 36 + Missense SNP > T > [7] 2 [ 96732769, 96732769] + | FER1L5 90342 > Broad 36 + Missense SNP > T > [8] 2 [179160267, 179160267] - | TTN 7273 > Broad 36 - Missense SNP > C > [9] 2 [217251189, 217251189] - | IGFBP5 3488 > Broad 36 - Missense SNP > C > [10] 3 [ 12620699, 12620699] - | RAF1 5894 > Broad 36 - Missense SNP > G > [11] 3 [ 46472880, 46472880] - | LTF 4057 > Broad 36 - Missense SNP > C > [12] 3 [137203130, 137203130] + | PPP2R3A 5523 > Broad 36 + Missense SNP > A > [13] 3 [137457429, 137457429] + | PCCB 5096 > Broad 36 + Missense SNP > C > [14] 3 [184708629, 184708629] - | KLHL6 89857 > Broad 36 - Missense SNP > G > [15] 4 [147434591, 147434591] - | SLC10A7 84068 > Broad 36 - Missense SNP > G > [16] 4 [185915412, 185915412] - | ACSL1 2180 > Broad 36 - Missense SNP > A > [17] 5 [ 79070342, 79070342] + | CMYA5 202333 > Broad 36 + Missense SNP > C > [18] 5 [ 94775579, 94775579] + | FAM81B 153643 > Broad 36 + Missense SNP > G > [19] 5 [140838266, 140838266] + | PCDHGC3 5098 > Broad 36 + Missense SNP > T > [20] 6 [109992724, 109992724] - | AKD1 221264 > Broad 36 - Missense SNP > C > [21] 6 [118993492, 118993492] - | C6orf204 387119 > Broad 36 - Missense SNP > G > [22] 7 [124286401, 124286401] - | POT1 25913 > Broad 36 - Missense SNP > A > [23] 7 [125960456, 125960456] - | GRM8 2918 > Broad 36 - Missense SNP > A > [24] 9 [ 35097658, 35097658] - | KIAA1539 80256 > Broad 36 - Missense SNP > A > [25] 9 [103164773, 103164773] - | BAAT 570 > Broad 36 - Missense SNP > G > [26] 9 [132745783, 132745783] + | ABL1 25 > Broad 36 + Missense SNP > C > [27] 9 [138481305, 138481305] - | SEC16A 9919 > Broad 36 - Missense SNP > C > [28] 10 [ 7661761, 7661761] - | ITIH5 80760 > Broad 36 - Missense SNP > C > [29] 10 [106064683, 106064683] - | ITPRIP 85450 > Broad 36 - Missense SNP > T > [30] 11 [ 603430, 603430] - | IRF7 3665 > Broad 36 - Missense SNP > G > [31] 11 [ 5610148, 5610148] + | TRIM34 53840 > Broad 36 + Missense SNP > A > [32] 11 [ 9420281, 9420281] + | IPO7 10527 > Broad 36 + Missense SNP > G > [33] 11 [ 26495005, 26495005] + | ANO3 63982 > Broad 36 + Missense SNP > C > [34] 11 [ 55629818, 55629818] + | OR8H2 390151 > Broad 36 + Missense SNP > G > [35] 11 [116737834, 116737834] + | CEP164 22897 > Broad 36 + Missense SNP > C > [36] 12 [ 25165980, 25165980] - | CASC1 55259 > Broad 36 - Missense SNP > G > [37] 12 [ 25289548, 25289548] - | KRAS 3845 > Broad 36 - Missense SNP > C > [38] 12 [ 31142142, 31142142] + | DDX11 1663 > Broad 36 + Missense SNP > G > [39] 12 [ 42434771, 42434771] - | PUS7L 83448 > Broad 36 - Missense SNP > T > [40] 12 [ 55006974, 55006974] - | PAN2 9924 > Broad 36 - Missense SNP > T > [41] 13 [102199348, 102199348] - | LOC643677 643677 > Broad 36 - Missense SNP > G > [42] 15 [ 40503502, 40503502] - | ZFP106 64397 > Broad 36 - Missense SNP > C > [43] 15 [ 70816269, 70816269] + | BBS4 585 > Broad 36 + Missense SNP > A > [44] 16 [ 23481033, 23481033] + | UBFD1 56061 > Broad 36 + Missense SNP > C > [45] 17 [ 21149010, 21149010] + | MAP2K3 5606 > Broad 36 + Missense SNP > G > [46] 17 [ 35812691, 35812691] - | TOP2A 7153 > Broad 36 - Missense SNP > T > [47] 18 [ 16788810, 16788810] - | ROCK1 6093 > Broad 36 - Missense SNP > C > [48] 18 [ 75322338, 75322338] + | NFATC1 4772 > Broad 36 + Missense SNP > G > [49] 19 [ 15713495, 15713495] + | OR10H3 26532 > Broad 36 + Missense SNP > G > [50] 19 [ 40730140, 40730140] + | TMEM147 10430 > Broad 36 + Missense SNP > C > [51] 19 [ 52338664, 52338664] + | SAE1 10055 > Broad 36 + Missense SNP > G > [52] 19 [ 57407795, 57407795] + | PPP2R1A 5518 > Broad 36 + Missense SNP > G > [53] 20 [ 23298287, 23298287] + | GZF1 64412 > Broad 36 + Missense SNP > A > [54] 20 [ 31012946, 31012946] + | EFCAB8 388795 > Broad 36 + Missense SNP > C > [55] 20 [ 40223536, 40223536] - | PTPRT 11122 > Broad 36 - Missense SNP > C > [56] 20 [ 54467136, 54467136] + | CASS4 57091 > Broad 36 + Missense SNP > G > [57] 20 [ 60201983, 60201983] + | GTPBP5 26164 > Broad 36 + Missense SNP > C > [58] 21 [ 36688774, 36688774] + | CHAF1B 8208 > Broad 36 + Missense SNP > C > [59] 21 [ 39699770, 39699770] - | LCA5L 150082 > Broad 36 - Missense SNP > T > [60] 22 [ 27437953, 27437953] - | CHEK2 11200 > Broad 36 - Missense SNP > C > > In the seqnames column chromosomes 8 and 14 are missing, which makes sense > - there is no SNPs on those. > > However, when do: > head(seqlengthsmut.gr), 25) > > 1 2 3 4 5 6 7 > 8 9 10 11 12 13 14 15 > 249250621 243199373 198022430 191154276 180915260 171115067 159138663 > 146364022 141213431 135534747 135006516 133851895 115169878 107349540 > 102531392 > 16 17 18 19 20 21 22 > 90354753 81195210 78077248 59128983 63025520 48129895 51304566 > > All 22 chromosomes are there. > > How is that possible? > > I believe that for my data, I need to do a similar thing (I'm trying to > create a multilayer circular plot in ggbio). Although I have snps for only > few chromosomes, I think I need to include lengths for all of them. > How do I do that? > > Best wishes, > Agnieszka > ________________________________________ > From: Hervé Pagès <hpages@fhcrc.org> > Sent: 02 May 2014 03:27 > To: vobencha@fhcrc.org; guest@bioconductor.org; bioconductor@r-project.org; > Miss Agnieszka Aleksandra Golicz > Cc: GenomicRanges Maintainer > Subject: Re: [devteam-bioc] GenomicRanges seqlengths problem > > On 05/01/2014 10:21 AM, Maintainer wrote: > > Hi Sonia, Val, > > > > On 05/01/2014 09:52 AM, Maintainer wrote: > >> Hi, > >> > >> The seqnames are in a different orders in the 'chrs' and 'sl' objects. > >> Looking at the first 3 names in each, > >> > >> > head(seqlengths(chrs), 3) > >> lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_10_v2 > >> NA NA NA > >> > >> > head(sl, 3) > >> lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_2_v2 > >> 4258568 3378610 2939989 > >> > >> When a GRanges is created, seqnames are sorted in ascii order. > > > > Just to clarify, the seqlevels are sorted, not the seqnames: > > > > > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) > > > gr > > GRanges with 3 ranges and 0 metadata columns: > > seqnames ranges strand > > <rle> <iranges> <rle> > > [1] b [1, 10] * > > [2] a [2, 10] * > > [3] b [3, 10] * > > --- > > seqlengths: > > a b > > NA NA > > > > Not sorted (i.e. user-supplied order is preserved): > > > > > seqnames(gr) > > factor-Rle of length 3 with 3 runs > > Lengths: 1 1 1 > > Values : b a b > > Levels(2): a b > > > > Sorted: > > > > > seqlevels(gr) > > [1] "a" "b" > > > > This sorting of the seqlevels is not good anyway (people with different > > LOCALE will get different results). I just fixed this so now the > > GRanges() constructor will preserve the seqlevels in the order supplied > > by the user: > > > > > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) > > > seqlevels(gr) > > [1] "b" "a" > > > > More precisely, the seqlevels are obtained by doing unique() on the > > seqnames. > > > > The fix will propagate to the public repos and become available thru > > biocLite() in the next 24 hours. > > > > Then your original code should just work Sonia. > > Or maybe not :) You might still need to use 'stringsAsFactors=FALSE' > when calling read.table(). Please let us know if that still doesn't > make it. > > Thanks, > H. > > > > > Cheers, > > H. > > > > > >> You can > >> see the full list with seqinfo(): > >> > >> > seqinfo(chrs) > >> Seqinfo of length 34 > >> seqnames seqlengths isCircular genome > >> lm_SuperContig_0_v2 <na> <na> <na> > >> lm_SuperContig_1_v2 <na> <na> <na> > >> lm_SuperContig_10_v2 <na> <na> <na> > >> lm_SuperContig_11_v2 <na> <na> <na> > >> lm_SuperContig_12_v2 <na> <na> <na> > >> ... ... ... ... > >> > >> The seqnames in the replacement vector must match the order in the > >> 'Seqinfo' object. > >> > >> To re-order: > >> > >> sl_new <- sl[match(levels(factor(names(sl))), names(sl))] > >> > >> Then add the new lengths: > >> > >>>> seqlengths(chrs) <- sl_new > >>>> chrs > >>> GRanges with 34 ranges and 0 metadata columns: > >>> seqnames ranges strand > >>> <rle> <iranges> <rle> > >>> [1] lm_SuperContig_0_v2 [1, 4258568] * > >>> [2] lm_SuperContig_1_v2 [1, 3378610] * > >>> [3] lm_SuperContig_2_v2 [1, 2939989] * > >>> [4] lm_SuperContig_3_v2 [1, 2348246] * > >>> [5] lm_SuperContig_4_v2 [1, 1918205] * > >>> ... ... ... ... > >>> [30] lm_SuperContig_29_v2 [1, 200940] * > >>> [31] lm_SuperContig_30_v2 [1, 154863] * > >>> [32] lm_SuperContig_31_v2 [1, 143268] * > >>> [33] lm_SuperContig_32_v2 [1, 87679] * > >>> [34] lm_SuperContig_34_v2 [1, 58596] * > >>> --- > >>> seqlengths: > >>> lm_SuperContig_0_v2 lm_SuperContig_1_v2 ... lm_SuperContig_9_v2 > >>> 4258568 3378610 ... 1772623 > >> > >> > >> Valerie > >> > >> > >> On 05/01/2014 07:20 AM, Maintainer wrote: > >>> > >>> Hello, > >>> > >>> I have a problem with supplying GRanges object with seqlengths. > >>> I have a files. > >>> chrs.txt - contains information about chromosomes > >>> > >>> chrs.txt > >>> chr,start,end,len > >>> lm_SuperContig_0_v2,1,4258568,4258568 > >>> lm_SuperContig_1_v2,1,3378610,3378610 > >>> lm_SuperContig_2_v2,1,2939989,2939989 > >>> lm_SuperContig_3_v2,1,2348246,2348246 > >>> lm_SuperContig_4_v2,1,1918205,1918205 > >>> lm_SuperContig_6_v2,1,1888674,1888674 > >>> lm_SuperContig_5_v2,1,1869450,1869450 > >>> lm_SuperContig_8_v2,1,1809296,1809296 > >>> lm_SuperContig_9_v2,1,1772623,1772623 > >>> lm_SuperContig_7_v2,1,1769547,1769547 > >>> lm_SuperContig_10_v2,1,1758670,1758670 > >>> lm_SuperContig_13_v2,1,1634580,1634580 > >>> lm_SuperContig_12_v2,1,1631710,1631710 > >>> lm_SuperContig_11_v2,1,1590160,1590160 > >>> lm_SuperContig_15_v2,1,1560629,1560629 > >>> lm_SuperContig_14_v2,1,1533332,1533332 > >>> lm_SuperContig_17_v2,1,1445693,1445693 > >>> lm_SuperContig_16_v2,1,1397653,1397653 > >>> lm_SuperContig_18_v2,1,1351976,1351976 > >>> lm_SuperContig_19_v2,1,1186800,1186800 > >>> lm_SuperContig_20_v2,1,1087932,1087932 > >>> lm_SuperContig_21_v2,1,1020521,1020521 > >>> lm_SuperContig_22_v2,1,731443,731443 > >>> lm_SuperContig_23_v2,1,521426,521426 > >>> lm_SuperContig_24_v2,1,475869,475869 > >>> lm_SuperContig_25_v2,1,318058,318058 > >>> lm_SuperContig_26_v2,1,261540,261540 > >>> lm_SuperContig_27_v2,1,250629,250629 > >>> lm_SuperContig_28_v2,1,236098,236098 > >>> lm_SuperContig_29_v2,1,200940,200940 > >>> lm_SuperContig_30_v2,1,154863,154863 > >>> lm_SuperContig_31_v2,1,143268,143268 > >>> lm_SuperContig_32_v2,1,87679,87679 > >>> lm_SuperContig_34_v2,1,58596,58596 > >>> > >>> I try to do the following: > >>> # creating GRanges object for chromosomes > >>> dataChr <- read.table("chrs.txt",header=T,sep=",") > >>> chrs <- with(dataChr, GRanges(chr, IRanges(start, end))) > >>> sl <- setNames(dataChr$len, as.character(dataChr$chr)) > >>> seqlengths(chrs) <- sl > >>> > >>> And I get the following error: > >>> > >>> Error in .normargSeqlengths(value, seqnames(x)) : > >>> when the supplied 'seqlengths' vector is named, the names must > match the seqnames > >>> > >>> Any chance you could help me with what is going on? > >>> > >>> Best wishes, > >>> Agnieszka > >>> > >>> > >>> -- output of sessionInfo(): > >>> > >>> R version 3.1.0 (2014-04-10) > >>> Platform: i386-w64-mingw32/i386 (32-bit) > >>> > >>> locale: > >>> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United > Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 > >>> [4] LC_NUMERIC=C LC_TIME=English_United > Kingdom.1252 > >>> > >>> attached base packages: > >>> [1] parallel stats graphics grDevices utils datasets > methods base > >>> > >>> other attached packages: > >>> [1] XVector_0.4.0 ggbio_1.12.3 ggplot2_0.9.3.1 > GenomicRanges_1.16.2 GenomeInfoDb_1.0.2 IRanges_1.22.4 > BiocGenerics_0.10.0 > >>> [8] BiocInstaller_1.14.2 > >>> > >>> loaded via a namespace (and not attached): > >>> [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6 > Biobase_2.24.0 BiocParallel_0.6.0 biomaRt_2.20.0 > >>> [7] Biostrings_2.32.0 biovizBase_1.12.1 bitops_1.0-6 > brew_1.0-6 BSgenome_1.32.0 cluster_1.15.2 > >>> [13] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 > dichromat_2.0-0 digest_0.6.4 fail_1.2 > >>> [19] foreach_1.4.2 Formula_1.1-1 > GenomicAlignments_1.0.0 GenomicFeatures_1.16.0 grid_3.1.0 > gridExtra_0.9.1 > >>> [25] gtable_0.1.2 Hmisc_3.14-4 iterators_1.0.7 > labeling_0.2 lattice_0.20-29 > latticeExtra_0.6-26 > >>> [31] MASS_7.3-31 munsell_0.4.2 plyr_1.8.1 > proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.11.1 > >>> [37] RCurl_1.95-4.1 reshape2_1.4 > Rsamtools_1.16.0 RSQLite_0.11.4 rtracklayer_1.24.0 > scales_0.2.4 > >>> [43] sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 > stringr_0.6.2 survival_2.37-7 tools_3.1.0 > >>> [49] VariantAnnotation_1.10.0 XML_3.98-1.1 zlibbioc_1.10.0 > >>> > >>> > >>> -- > >>> Sent via the guest posting facility at bioconductor.org. > >>> > >>> > ____________________________________________________________________ ____ > >>> devteam-bioc mailing list > >>> To unsubscribe from this mailing list send a blank email to > >>> devteam-bioc-leave@lists.fhcrc.org > >>> You can also unsubscribe or change your personal options at > >>> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > >>> > >> > >> > > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Tengfei Yin, PhD Seven Bridges Genomics sbgenomics.com 625 Mt. Auburn St. Suite #208 Cambridge, MA 02138 (617) 866-0446 [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
It worked. Thanks very much. I've never used GenomicRanges before trying ggbio this week and did not realize one can add seqlevels like that. Thanks again, your answer probably saved me lots of time! Best wishes, Agnieszka ________________________________ From: Tengfei Yin <tengfei.yin@sbgenomics.com> Sent: 04 May 2014 11:03 To: Miss Agnieszka Aleksandra Golicz Cc: Herv� Pag�s; vobencha@fhcrc.org; guest@bioconductor.org; bioconductor@r-project.org; GenomicRanges Maintainer Subject: Re: [BioC] [devteam-bioc] GenomicRanges seqlengths problem Hi Agnieszka, it's not really a ggbio related question, but would you mind try seqlevels(snps) <- names(seqlens) # then seqlengths(snps) <- seqlens to see if that solved your problem. cheers Tengfei On Sat, May 3, 2014 at 6:32 PM, Miss Agnieszka Aleksandra Golicz <agnieszka.golicz@uq.net.au<mailto:agnieszka.golicz@uq.net.au>> wrote: Should I post it as a separate ggbio related questio? Best wishes, Agnieszka ________________________________________ From: Miss Agnieszka Aleksandra Golicz Sent: 02 May 2014 10:28 To: Herv� Pag�s; vobencha@fhcrc.org<mailto:vobencha@fhcrc.org>; guest@bioconductor.org<mailto:guest@bioconductor.org>; bioconductor@r-project.org<mailto:bioconductor@r-project.org> Cc: GenomicRanges Maintainer Subject: RE: [devteam-bioc] GenomicRanges seqlengths problem Sorry did not include my chromosome and snp files. It looks like this: chrs.txt - summary of chromosomes chr,start,end,len 0,1,4258568,4258568 1,1,3378610,3378610 2,1,2939989,2939989 3,1,2348246,2348246 4,1,1918205,1918205 6,1,1888674,1888674 5,1,1869450,1869450 8,1,1809296,1809296 9,1,1772623,1772623 7,1,1769547,1769547 10,1,1758670,1758670 13,1,1634580,1634580 12,1,1631710,1631710 11,1,1590160,1590160 15,1,1560629,1560629 14,1,1533332,1533332 17,1,1445693,1445693 16,1,1397653,1397653 18,1,1351976,1351976 19,1,1186800,1186800 20,1,1087932,1087932 21,1,1020521,1020521 22,1,731443,731443 23,1,521426,521426 24,1,475869,475869 25,1,318058,318058 26,1,261540,261540 27,1,250629,250629 28,1,236098,236098 29,1,200940,200940 30,1,154863,154863 31,1,143268,143268 32,1,87679,87679 34,1,58596,58596 snps.txt - SNP file chr,id,start,end 6,egn4_Lema_G046740.1,538805,538805 6,egn4_Lema_G049660.1,1607018,1607018 10,egn4_Lema_G076370.1,1242542,1242542 13,egn4_Lema_G081640.1,1352946,1352946 12,egn4_Lema_G082370.1,109077,109077 12,egn4_Lema_G085610.1,1190615,1190615 12,egn4_Lema_G085620.1,1190933,1190933 15,egn4_Lema_G090230.1,331669,331669 16,egn4_Lema_G104510.1,667778,667778 18,egn4_Lema_G110260.1,1321392,1321392 19,egn4_Lema_G113060.1,958993,958993 19,egn4_Lema_G113080.1,1177185,1177185 21,egn4_Lema_G116710.1,263332,263332 If I only do that: dataChr <- read.table("chrs.txt",header=T,sep=",",stringsAsFactors=FALSE) dataChrS <- dataChr[order(dataChr$chr),] t <- read.table("snps.txt",header=T,sep=",") snps <- with(t, GRanges(chr, IRanges(start, end), id=id)) seqlens <- setNames(dataChrS$len, as.character(dataChrS$chr)) seqlengths(snps) <- seqlens I get the initial error - Error in .normargSeqlengths(value, seqnames(x)) : when the supplied 'seqlengths' vector is named, the names must match the seqnames Makes sense, there are no SNPs on some chromosomes If I do that: lChrs <- dataChrS[dataChrS$chr%in%t$chr,] seqlens <- setNames(lChrs$len, as.character(lChrs$chr)) seqlengths(snps) <- seqlens seqlegths(snps) 6 10 12 13 15 16 1888674 1758670 1631710 1634580 1560629 1397653 I only have chromosomes with SNPs. Best wishes, Agnieszka ________________________________________ From: Miss Agnieszka Aleksandra Golicz Sent: 02 May 2014 10:01 To: Herv� Pag�s; vobencha@fhcrc.org<mailto:vobencha@fhcrc.org>; guest@bioconductor.org<mailto:guest@bioconductor.org>; bioconductor@r-project.org<mailto:bioconductor@r-project.org> Cc: GenomicRanges Maintainer Subject: RE: [devteam-bioc] GenomicRanges seqlengths problem Hello, Thank you very much for your help. I am facing one more problem with GRanges. I am trying to emulate this object. data("CRC", package = "biovizBase") object: mut.gr<http: mut.gr=""> (it's SNP annotation for human chromosomes 1-22) When I just do show on mut.gr<http: mut.gr=""> showmut.gr<http: mut.gr="">) GRanges with 60 ranges and 10 metadata columns: seqnames ranges strand | Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Strand Variant_Classification Variant_Type Reference_Allele <rle> <iranges> <rle> | <factor> <integer> <factor> <integer> <factor> <factor> <factor> <factor> [1] 1 [ 11003085, 11003085] + | TARDBP 23435 Broad 36 + Missense SNP G [2] 1 [ 62352395, 62352395] + | INADL 10207 Broad 36 + Missense SNP T [3] 1 [194960885, 194960885] + | CFH 3075 Broad 36 + Missense SNP G [4] 2 [ 10116508, 10116508] - | CYS1 192668 Broad 36 - Missense SNP C [5] 2 [ 33617747, 33617747] + | RASGRP3 25780 Broad 36 + Missense SNP C [6] 2 [ 73894280, 73894280] + | C2orf78 388960 Broad 36 + Missense SNP T [7] 2 [ 96732769, 96732769] + | FER1L5 90342 Broad 36 + Missense SNP T [8] 2 [179160267, 179160267] - | TTN 7273 Broad 36 - Missense SNP C [9] 2 [217251189, 217251189] - | IGFBP5 3488 Broad 36 - Missense SNP C [10] 3 [ 12620699, 12620699] - | RAF1 5894 Broad 36 - Missense SNP G [11] 3 [ 46472880, 46472880] - | LTF 4057 Broad 36 - Missense SNP C [12] 3 [137203130, 137203130] + | PPP2R3A 5523 Broad 36 + Missense SNP A [13] 3 [137457429, 137457429] + | PCCB 5096 Broad 36 + Missense SNP C [14] 3 [184708629, 184708629] - | KLHL6 89857 Broad 36 - Missense SNP G [15] 4 [147434591, 147434591] - | SLC10A7 84068 Broad 36 - Missense SNP G [16] 4 [185915412, 185915412] - | ACSL1 2180 Broad 36 - Missense SNP A [17] 5 [ 79070342, 79070342] + | CMYA5 202333 Broad 36 + Missense SNP C [18] 5 [ 94775579, 94775579] + | FAM81B 153643 Broad 36 + Missense SNP G [19] 5 [140838266, 140838266] + | PCDHGC3 5098 Broad 36 + Missense SNP T [20] 6 [109992724, 109992724] - | AKD1 221264 Broad 36 - Missense SNP C [21] 6 [118993492, 118993492] - | C6orf204 387119 Broad 36 - Missense SNP G [22] 7 [124286401, 124286401] - | POT1 25913 Broad 36 - Missense SNP A [23] 7 [125960456, 125960456] - | GRM8 2918 Broad 36 - Missense SNP A [24] 9 [ 35097658, 35097658] - | KIAA1539 80256 Broad 36 - Missense SNP A [25] 9 [103164773, 103164773] - | BAAT 570 Broad 36 - Missense SNP G [26] 9 [132745783, 132745783] + | ABL1 25 Broad 36 + Missense SNP C [27] 9 [138481305, 138481305] - | SEC16A 9919 Broad 36 - Missense SNP C [28] 10 [ 7661761, 7661761] - | ITIH5 80760 Broad 36 - Missense SNP C [29] 10 [106064683, 106064683] - | ITPRIP 85450 Broad 36 - Missense SNP T [30] 11 [ 603430, 603430] - | IRF7 3665 Broad 36 - Missense SNP G [31] 11 [ 5610148, 5610148] + | TRIM34 53840 Broad 36 + Missense SNP A [32] 11 [ 9420281, 9420281] + | IPO7 10527 Broad 36 + Missense SNP G [33] 11 [ 26495005, 26495005] + | ANO3 63982 Broad 36 + Missense SNP C [34] 11 [ 55629818, 55629818] + | OR8H2 390151 Broad 36 + Missense SNP G [35] 11 [116737834, 116737834] + | CEP164 22897 Broad 36 + Missense SNP C [36] 12 [ 25165980, 25165980] - | CASC1 55259 Broad 36 - Missense SNP G [37] 12 [ 25289548, 25289548] - | KRAS 3845 Broad 36 - Missense SNP C [38] 12 [ 31142142, 31142142] + | DDX11 1663 Broad 36 + Missense SNP G [39] 12 [ 42434771, 42434771] - | PUS7L 83448 Broad 36 - Missense SNP T [40] 12 [ 55006974, 55006974] - | PAN2 9924 Broad 36 - Missense SNP T [41] 13 [102199348, 102199348] - | LOC643677 643677 Broad 36 - Missense SNP G [42] 15 [ 40503502, 40503502] - | ZFP106 64397 Broad 36 - Missense SNP C [43] 15 [ 70816269, 70816269] + | BBS4 585 Broad 36 + Missense SNP A [44] 16 [ 23481033, 23481033] + | UBFD1 56061 Broad 36 + Missense SNP C [45] 17 [ 21149010, 21149010] + | MAP2K3 5606 Broad 36 + Missense SNP G [46] 17 [ 35812691, 35812691] - | TOP2A 7153 Broad 36 - Missense SNP T [47] 18 [ 16788810, 16788810] - | ROCK1 6093 Broad 36 - Missense SNP C [48] 18 [ 75322338, 75322338] + | NFATC1 4772 Broad 36 + Missense SNP G [49] 19 [ 15713495, 15713495] + | OR10H3 26532 Broad 36 + Missense SNP G [50] 19 [ 40730140, 40730140] + | TMEM147 10430 Broad 36 + Missense SNP C [51] 19 [ 52338664, 52338664] + | SAE1 10055 Broad 36 + Missense SNP G [52] 19 [ 57407795, 57407795] + | PPP2R1A 5518 Broad 36 + Missense SNP G [53] 20 [ 23298287, 23298287] + | GZF1 64412 Broad 36 + Missense SNP A [54] 20 [ 31012946, 31012946] + | EFCAB8 388795 Broad 36 + Missense SNP C [55] 20 [ 40223536, 40223536] - | PTPRT 11122 Broad 36 - Missense SNP C [56] 20 [ 54467136, 54467136] + | CASS4 57091 Broad 36 + Missense SNP G [57] 20 [ 60201983, 60201983] + | GTPBP5 26164 Broad 36 + Missense SNP C [58] 21 [ 36688774, 36688774] + | CHAF1B 8208 Broad 36 + Missense SNP C [59] 21 [ 39699770, 39699770] - | LCA5L 150082 Broad 36 - Missense SNP T [60] 22 [ 27437953, 27437953] - | CHEK2 11200 Broad 36 - Missense SNP C In the seqnames column chromosomes 8 and 14 are missing, which makes sense - there is no SNPs on those. However, when do: head(seqlengthsmut.gr<http: mut.gr="">), 25) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 249250621 243199373 198022430 191154276 180915260 171115067 159138663 146364022 141213431 135534747 135006516 133851895 115169878 107349540 102531392 16 17 18 19 20 21 22 90354753 81195210 78077248 59128983 63025520 48129895 51304566 All 22 chromosomes are there. How is that possible? I believe that for my data, I need to do a similar thing (I'm trying to create a multilayer circular plot in ggbio). Although I have snps for only few chromosomes, I think I need to include lengths for all of them. How do I do that? Best wishes, Agnieszka ________________________________________ From: Herv� Pag�s <hpages@fhcrc.org<mailto:hpages@fhcrc.org>> Sent: 02 May 2014 03:27 To: vobencha@fhcrc.org<mailto:vobencha@fhcrc.org>; guest@bioconductor.org<mailto:guest@bioconductor.org>; bioconductor@r-project.org<mailto:bioconductor@r-project.org>; Miss Agnieszka Aleksandra Golicz Cc: GenomicRanges Maintainer Subject: Re: [devteam-bioc] GenomicRanges seqlengths problem On 05/01/2014 10:21 AM, Maintainer wrote: > Hi Sonia, Val, > > On 05/01/2014 09:52 AM, Maintainer wrote: >> Hi, >> >> The seqnames are in a different orders in the 'chrs' and 'sl' objects. >> Looking at the first 3 names in each, >> >> > head(seqlengths(chrs), 3) >> lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_10_v2 >> NA NA NA >> >> > head(sl, 3) >> lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_2_v2 >> 4258568 3378610 2939989 >> >> When a GRanges is created, seqnames are sorted in ascii order. > > Just to clarify, the seqlevels are sorted, not the seqnames: > > > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) > > gr > GRanges with 3 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] b [1, 10] * > [2] a [2, 10] * > [3] b [3, 10] * > --- > seqlengths: > a b > NA NA > > Not sorted (i.e. user-supplied order is preserved): > > > seqnames(gr) > factor-Rle of length 3 with 3 runs > Lengths: 1 1 1 > Values : b a b > Levels(2): a b > > Sorted: > > > seqlevels(gr) > [1] "a" "b" > > This sorting of the seqlevels is not good anyway (people with different > LOCALE will get different results). I just fixed this so now the > GRanges() constructor will preserve the seqlevels in the order supplied > by the user: > > > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) > > seqlevels(gr) > [1] "b" "a" > > More precisely, the seqlevels are obtained by doing unique() on the > seqnames. > > The fix will propagate to the public repos and become available thru > biocLite() in the next 24 hours. > > Then your original code should just work Sonia. Or maybe not :) You might still need to use 'stringsAsFactors=FALSE' when calling read.table(). Please let us know if that still doesn't make it. Thanks, H. > > Cheers, > H. > > >> You can >> see the full list with seqinfo(): >> >> > seqinfo(chrs) >> Seqinfo of length 34 >> seqnames seqlengths isCircular genome >> lm_SuperContig_0_v2 <na> <na> <na> >> lm_SuperContig_1_v2 <na> <na> <na> >> lm_SuperContig_10_v2 <na> <na> <na> >> lm_SuperContig_11_v2 <na> <na> <na> >> lm_SuperContig_12_v2 <na> <na> <na> >> ... ... ... ... >> >> The seqnames in the replacement vector must match the order in the >> 'Seqinfo' object. >> >> To re-order: >> >> sl_new <- sl[match(levels(factor(names(sl))), names(sl))] >> >> Then add the new lengths: >> >>>> seqlengths(chrs) <- sl_new >>>> chrs >>> GRanges with 34 ranges and 0 metadata columns: >>> seqnames ranges strand >>> <rle> <iranges> <rle> >>> [1] lm_SuperContig_0_v2 [1, 4258568] * >>> [2] lm_SuperContig_1_v2 [1, 3378610] * >>> [3] lm_SuperContig_2_v2 [1, 2939989] * >>> [4] lm_SuperContig_3_v2 [1, 2348246] * >>> [5] lm_SuperContig_4_v2 [1, 1918205] * >>> ... ... ... ... >>> [30] lm_SuperContig_29_v2 [1, 200940] * >>> [31] lm_SuperContig_30_v2 [1, 154863] * >>> [32] lm_SuperContig_31_v2 [1, 143268] * >>> [33] lm_SuperContig_32_v2 [1, 87679] * >>> [34] lm_SuperContig_34_v2 [1, 58596] * >>> --- >>> seqlengths: >>> lm_SuperContig_0_v2 lm_SuperContig_1_v2 ... lm_SuperContig_9_v2 >>> 4258568 3378610 ... 1772623 >> >> >> Valerie >> >> >> On 05/01/2014 07:20 AM, Maintainer wrote: >>> >>> Hello, >>> >>> I have a problem with supplying GRanges object with seqlengths. >>> I have a files. >>> chrs.txt - contains information about chromosomes >>> >>> chrs.txt >>> chr,start,end,len >>> lm_SuperContig_0_v2,1,4258568,4258568 >>> lm_SuperContig_1_v2,1,3378610,3378610 >>> lm_SuperContig_2_v2,1,2939989,2939989 >>> lm_SuperContig_3_v2,1,2348246,2348246 >>> lm_SuperContig_4_v2,1,1918205,1918205 >>> lm_SuperContig_6_v2,1,1888674,1888674 >>> lm_SuperContig_5_v2,1,1869450,1869450 >>> lm_SuperContig_8_v2,1,1809296,1809296 >>> lm_SuperContig_9_v2,1,1772623,1772623 >>> lm_SuperContig_7_v2,1,1769547,1769547 >>> lm_SuperContig_10_v2,1,1758670,1758670 >>> lm_SuperContig_13_v2,1,1634580,1634580 >>> lm_SuperContig_12_v2,1,1631710,1631710 >>> lm_SuperContig_11_v2,1,1590160,1590160 >>> lm_SuperContig_15_v2,1,1560629,1560629 >>> lm_SuperContig_14_v2,1,1533332,1533332 >>> lm_SuperContig_17_v2,1,1445693,1445693 >>> lm_SuperContig_16_v2,1,1397653,1397653 >>> lm_SuperContig_18_v2,1,1351976,1351976 >>> lm_SuperContig_19_v2,1,1186800,1186800 >>> lm_SuperContig_20_v2,1,1087932,1087932 >>> lm_SuperContig_21_v2,1,1020521,1020521 >>> lm_SuperContig_22_v2,1,731443,731443 >>> lm_SuperContig_23_v2,1,521426,521426 >>> lm_SuperContig_24_v2,1,475869,475869 >>> lm_SuperContig_25_v2,1,318058,318058 >>> lm_SuperContig_26_v2,1,261540,261540 >>> lm_SuperContig_27_v2,1,250629,250629 >>> lm_SuperContig_28_v2,1,236098,236098 >>> lm_SuperContig_29_v2,1,200940,200940 >>> lm_SuperContig_30_v2,1,154863,154863 >>> lm_SuperContig_31_v2,1,143268,143268 >>> lm_SuperContig_32_v2,1,87679,87679 >>> lm_SuperContig_34_v2,1,58596,58596 >>> >>> I try to do the following: >>> # creating GRanges object for chromosomes >>> dataChr <- read.table("chrs.txt",header=T,sep=",") >>> chrs <- with(dataChr, GRanges(chr, IRanges(start, end))) >>> sl <- setNames(dataChr$len, as.character(dataChr$chr)) >>> seqlengths(chrs) <- sl >>> >>> And I get the following error: >>> >>> Error in .normargSeqlengths(value, seqnames(x)) : >>> when the supplied 'seqlengths' vector is named, the names must match the seqnames >>> >>> Any chance you could help me with what is going on? >>> >>> Best wishes, >>> Agnieszka >>> >>> >>> -- output of sessionInfo(): >>> >>> R version 3.1.0 (2014-04-10) >>> Platform: i386-w64-mingw32/i386 (32-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 >>> [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] XVector_0.4.0 ggbio_1.12.3 ggplot2_0.9.3.1 GenomicRanges_1.16.2 GenomeInfoDb_1.0.2 IRanges_1.22.4 BiocGenerics_0.10.0 >>> [8] BiocInstaller_1.14.2 >>> >>> loaded via a namespace (and not attached): >>> [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6 Biobase_2.24.0 BiocParallel_0.6.0 biomaRt_2.20.0 >>> [7] Biostrings_2.32.0 biovizBase_1.12.1 bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 cluster_1.15.2 >>> [13] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 digest_0.6.4 fail_1.2 >>> [19] foreach_1.4.2 Formula_1.1-1 GenomicAlignments_1.0.0 GenomicFeatures_1.16.0 grid_3.1.0 gridExtra_0.9.1 >>> [25] gtable_0.1.2 Hmisc_3.14-4 iterators_1.0.7 labeling_0.2 lattice_0.20-29 latticeExtra_0.6-26 >>> [31] MASS_7.3-31 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.11.1 >>> [37] RCurl_1.95-4.1 reshape2_1.4 Rsamtools_1.16.0 RSQLite_0.11.4 rtracklayer_1.24.0 scales_0.2.4 >>> [43] sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 stringr_0.6.2 survival_2.37-7 tools_3.1.0 >>> [49] VariantAnnotation_1.10.0 XML_3.98-1.1 zlibbioc_1.10.0 >>> >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org<http: bioconductor.org="">. >>> >>> __________________________________________________________________ ______ >>> devteam-bioc mailing list >>> To unsubscribe from this mailing list send a blank email to >>> devteam-bioc-leave@lists.fhcrc.org<mailto:devteam-bioc- leave@lists.fhcrc.org=""> >>> You can also unsubscribe or change your personal options at >>> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc >>> >> >> > -- Herv� Pag�s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages@fhcrc.org<mailto:hpages@fhcrc.org> Phone: (206) 667-5791<tel:%28206%29%20667-5791> Fax: (206) 667-1319<tel:%28206%29%20667-1319> _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Tengfei Yin, PhD Seven Bridges Genomics sbgenomics.com<http: sbgenomics.com=""/> 625 Mt. Auburn St. Suite #208 Cambridge, MA 02138 (617) 866-0446 [[alternative HTML version deleted]]
0
Entering edit mode
Glad that it works :) btw, if you use seqlevels to control how many chromosome shown in the circular view, make sure every track data, have exactly the same seqlevels and seqlengths, order matters too, I don't have any validation now for different tracks ... I will add one later. cheers On Sat, May 3, 2014 at 9:15 PM, Miss Agnieszka Aleksandra Golicz < agnieszka.golicz@uq.net.au> wrote: > It worked. > > Thanks very much. > > I've never used GenomicRanges before trying ggbio this week and did not > realize one can add seqlevels like that. > > > Thanks again, your answer probably saved me lots of time! > > > Best wishes, > > Agnieszka > > > ------------------------------ > *From:* Tengfei Yin <tengfei.yin@sbgenomics.com> > *Sent:* 04 May 2014 11:03 > *To:* Miss Agnieszka Aleksandra Golicz > *Cc:* Hervé Pagès; vobencha@fhcrc.org; guest@bioconductor.org; > bioconductor@r-project.org; GenomicRanges Maintainer > *Subject:* Re: [BioC] [devteam-bioc] GenomicRanges seqlengths problem > > Hi Agnieszka, > > it's not really a ggbio related question, but would you mind try > > seqlevels(snps) <- names(seqlens) # then > seqlengths(snps) <- seqlens > > to see if that solved your problem. > > cheers > > Tengfei > > > > On Sat, May 3, 2014 at 6:32 PM, Miss Agnieszka Aleksandra Golicz < > agnieszka.golicz@uq.net.au> wrote: > >> Should I post it as a separate ggbio related questio? >> >> Best wishes, >> Agnieszka >> ________________________________________ >> From: Miss Agnieszka Aleksandra Golicz >> Sent: 02 May 2014 10:28 >> To: Hervé Pagès; vobencha@fhcrc.org; guest@bioconductor.org; >> bioconductor@r-project.org >> Cc: GenomicRanges Maintainer >> Subject: RE: [devteam-bioc] GenomicRanges seqlengths problem >> >> Sorry did not include my chromosome and snp files. >> >> It looks like this: >> chrs.txt - summary of chromosomes >> chr,start,end,len >> 0,1,4258568,4258568 >> 1,1,3378610,3378610 >> 2,1,2939989,2939989 >> 3,1,2348246,2348246 >> 4,1,1918205,1918205 >> 6,1,1888674,1888674 >> 5,1,1869450,1869450 >> 8,1,1809296,1809296 >> 9,1,1772623,1772623 >> 7,1,1769547,1769547 >> 10,1,1758670,1758670 >> 13,1,1634580,1634580 >> 12,1,1631710,1631710 >> 11,1,1590160,1590160 >> 15,1,1560629,1560629 >> 14,1,1533332,1533332 >> 17,1,1445693,1445693 >> 16,1,1397653,1397653 >> 18,1,1351976,1351976 >> 19,1,1186800,1186800 >> 20,1,1087932,1087932 >> 21,1,1020521,1020521 >> 22,1,731443,731443 >> 23,1,521426,521426 >> 24,1,475869,475869 >> 25,1,318058,318058 >> 26,1,261540,261540 >> 27,1,250629,250629 >> 28,1,236098,236098 >> 29,1,200940,200940 >> 30,1,154863,154863 >> 31,1,143268,143268 >> 32,1,87679,87679 >> 34,1,58596,58596 >> >> snps.txt - SNP file >> chr,id,start,end >> 6,egn4_Lema_G046740.1,538805,538805 >> 6,egn4_Lema_G049660.1,1607018,1607018 >> 10,egn4_Lema_G076370.1,1242542,1242542 >> 13,egn4_Lema_G081640.1,1352946,1352946 >> 12,egn4_Lema_G082370.1,109077,109077 >> 12,egn4_Lema_G085610.1,1190615,1190615 >> 12,egn4_Lema_G085620.1,1190933,1190933 >> 15,egn4_Lema_G090230.1,331669,331669 >> 16,egn4_Lema_G104510.1,667778,667778 >> 18,egn4_Lema_G110260.1,1321392,1321392 >> 19,egn4_Lema_G113060.1,958993,958993 >> 19,egn4_Lema_G113080.1,1177185,1177185 >> 21,egn4_Lema_G116710.1,263332,263332 >> >> If I only do that: >> dataChr <- read.table("chrs.txt",header=T,sep=",",stringsAsFactors=FALSE) >> dataChrS <- dataChr[order(dataChr$chr),] >> t <- read.table("snps.txt",header=T,sep=",") >> snps <- with(t, GRanges(chr, IRanges(start, end), id=id)) >> seqlens <- setNames(dataChrS$len, as.character(dataChrS$chr)) >> seqlengths(snps) <- seqlens >> I get the initial error - >> Error in .normargSeqlengths(value, seqnames(x)) : >> when the supplied 'seqlengths' vector is named, the names must match the >> seqnames >> Makes sense, there are no SNPs on some chromosomes >> >> If I do that: >> lChrs <- dataChrS[dataChrS$chr%in%t$chr,] >> seqlens <- setNames(lChrs$len, as.character(lChrs$chr)) >> seqlengths(snps) <- seqlens >> seqlegths(snps) >> 6 10 12 13 15 16 >> 1888674 1758670 1631710 1634580 1560629 1397653 >> >> I only have chromosomes with SNPs. >> >> Best wishes, >> Agnieszka >> >> ________________________________________ >> From: Miss Agnieszka Aleksandra Golicz >> Sent: 02 May 2014 10:01 >> To: Hervé Pagès; vobencha@fhcrc.org; guest@bioconductor.org; >> bioconductor@r-project.org >> Cc: GenomicRanges Maintainer >> Subject: RE: [devteam-bioc] GenomicRanges seqlengths problem >> >> Hello, >> >> Thank you very much for your help. >> >> I am facing one more problem with GRanges. >> I am trying to emulate this object. >> data("CRC", package = "biovizBase") >> object: mut.gr (it's SNP annotation for human chromosomes 1-22) >> >> When I just do show on mut.gr >> showmut.gr) >> GRanges with 60 ranges and 10 metadata columns: >> seqnames ranges strand | Hugo_Symbol >> Entrez_Gene_Id Center NCBI_Build Strand Variant_Classification >> Variant_Type Reference_Allele >> <rle> <iranges> <rle> | <factor> >> <integer> <factor> <integer> <factor> <factor> <factor> >> <factor> >> [1] 1 [ 11003085, 11003085] + | TARDBP >> 23435 Broad 36 + Missense SNP >> G >> [2] 1 [ 62352395, 62352395] + | INADL >> 10207 Broad 36 + Missense SNP >> T >> [3] 1 [194960885, 194960885] + | CFH >> 3075 Broad 36 + Missense SNP >> G >> [4] 2 [ 10116508, 10116508] - | CYS1 >> 192668 Broad 36 - Missense SNP >> C >> [5] 2 [ 33617747, 33617747] + | RASGRP3 >> 25780 Broad 36 + Missense SNP >> C >> [6] 2 [ 73894280, 73894280] + | C2orf78 >> 388960 Broad 36 + Missense SNP >> T >> [7] 2 [ 96732769, 96732769] + | FER1L5 >> 90342 Broad 36 + Missense SNP >> T >> [8] 2 [179160267, 179160267] - | TTN >> 7273 Broad 36 - Missense SNP >> C >> [9] 2 [217251189, 217251189] - | IGFBP5 >> 3488 Broad 36 - Missense SNP >> C >> [10] 3 [ 12620699, 12620699] - | RAF1 >> 5894 Broad 36 - Missense SNP >> G >> [11] 3 [ 46472880, 46472880] - | LTF >> 4057 Broad 36 - Missense SNP >> C >> [12] 3 [137203130, 137203130] + | PPP2R3A >> 5523 Broad 36 + Missense SNP >> A >> [13] 3 [137457429, 137457429] + | PCCB >> 5096 Broad 36 + Missense SNP >> C >> [14] 3 [184708629, 184708629] - | KLHL6 >> 89857 Broad 36 - Missense SNP >> G >> [15] 4 [147434591, 147434591] - | SLC10A7 >> 84068 Broad 36 - Missense SNP >> G >> [16] 4 [185915412, 185915412] - | ACSL1 >> 2180 Broad 36 - Missense SNP >> A >> [17] 5 [ 79070342, 79070342] + | CMYA5 >> 202333 Broad 36 + Missense SNP >> C >> [18] 5 [ 94775579, 94775579] + | FAM81B >> 153643 Broad 36 + Missense SNP >> G >> [19] 5 [140838266, 140838266] + | PCDHGC3 >> 5098 Broad 36 + Missense SNP >> T >> [20] 6 [109992724, 109992724] - | AKD1 >> 221264 Broad 36 - Missense SNP >> C >> [21] 6 [118993492, 118993492] - | C6orf204 >> 387119 Broad 36 - Missense SNP >> G >> [22] 7 [124286401, 124286401] - | POT1 >> 25913 Broad 36 - Missense SNP >> A >> [23] 7 [125960456, 125960456] - | GRM8 >> 2918 Broad 36 - Missense SNP >> A >> [24] 9 [ 35097658, 35097658] - | KIAA1539 >> 80256 Broad 36 - Missense SNP >> A >> [25] 9 [103164773, 103164773] - | BAAT >> 570 Broad 36 - Missense SNP >> G >> [26] 9 [132745783, 132745783] + | ABL1 >> 25 Broad 36 + Missense SNP >> C >> [27] 9 [138481305, 138481305] - | SEC16A >> 9919 Broad 36 - Missense SNP >> C >> [28] 10 [ 7661761, 7661761] - | ITIH5 >> 80760 Broad 36 - Missense SNP >> C >> [29] 10 [106064683, 106064683] - | ITPRIP >> 85450 Broad 36 - Missense SNP >> T >> [30] 11 [ 603430, 603430] - | IRF7 >> 3665 Broad 36 - Missense SNP >> G >> [31] 11 [ 5610148, 5610148] + | TRIM34 >> 53840 Broad 36 + Missense SNP >> A >> [32] 11 [ 9420281, 9420281] + | IPO7 >> 10527 Broad 36 + Missense SNP >> G >> [33] 11 [ 26495005, 26495005] + | ANO3 >> 63982 Broad 36 + Missense SNP >> C >> [34] 11 [ 55629818, 55629818] + | OR8H2 >> 390151 Broad 36 + Missense SNP >> G >> [35] 11 [116737834, 116737834] + | CEP164 >> 22897 Broad 36 + Missense SNP >> C >> [36] 12 [ 25165980, 25165980] - | CASC1 >> 55259 Broad 36 - Missense SNP >> G >> [37] 12 [ 25289548, 25289548] - | KRAS >> 3845 Broad 36 - Missense SNP >> C >> [38] 12 [ 31142142, 31142142] + | DDX11 >> 1663 Broad 36 + Missense SNP >> G >> [39] 12 [ 42434771, 42434771] - | PUS7L >> 83448 Broad 36 - Missense SNP >> T >> [40] 12 [ 55006974, 55006974] - | PAN2 >> 9924 Broad 36 - Missense SNP >> T >> [41] 13 [102199348, 102199348] - | LOC643677 >> 643677 Broad 36 - Missense SNP >> G >> [42] 15 [ 40503502, 40503502] - | ZFP106 >> 64397 Broad 36 - Missense SNP >> C >> [43] 15 [ 70816269, 70816269] + | BBS4 >> 585 Broad 36 + Missense SNP >> A >> [44] 16 [ 23481033, 23481033] + | UBFD1 >> 56061 Broad 36 + Missense SNP >> C >> [45] 17 [ 21149010, 21149010] + | MAP2K3 >> 5606 Broad 36 + Missense SNP >> G >> [46] 17 [ 35812691, 35812691] - | TOP2A >> 7153 Broad 36 - Missense SNP >> T >> [47] 18 [ 16788810, 16788810] - | ROCK1 >> 6093 Broad 36 - Missense SNP >> C >> [48] 18 [ 75322338, 75322338] + | NFATC1 >> 4772 Broad 36 + Missense SNP >> G >> [49] 19 [ 15713495, 15713495] + | OR10H3 >> 26532 Broad 36 + Missense SNP >> G >> [50] 19 [ 40730140, 40730140] + | TMEM147 >> 10430 Broad 36 + Missense SNP >> C >> [51] 19 [ 52338664, 52338664] + | SAE1 >> 10055 Broad 36 + Missense SNP >> G >> [52] 19 [ 57407795, 57407795] + | PPP2R1A >> 5518 Broad 36 + Missense SNP >> G >> [53] 20 [ 23298287, 23298287] + | GZF1 >> 64412 Broad 36 + Missense SNP >> A >> [54] 20 [ 31012946, 31012946] + | EFCAB8 >> 388795 Broad 36 + Missense SNP >> C >> [55] 20 [ 40223536, 40223536] - | PTPRT >> 11122 Broad 36 - Missense SNP >> C >> [56] 20 [ 54467136, 54467136] + | CASS4 >> 57091 Broad 36 + Missense SNP >> G >> [57] 20 [ 60201983, 60201983] + | GTPBP5 >> 26164 Broad 36 + Missense SNP >> C >> [58] 21 [ 36688774, 36688774] + | CHAF1B >> 8208 Broad 36 + Missense SNP >> C >> [59] 21 [ 39699770, 39699770] - | LCA5L >> 150082 Broad 36 - Missense SNP >> T >> [60] 22 [ 27437953, 27437953] - | CHEK2 >> 11200 Broad 36 - Missense SNP >> C >> >> In the seqnames column chromosomes 8 and 14 are missing, which makes >> sense - there is no SNPs on those. >> >> However, when do: >> head(seqlengthsmut.gr), 25) >> >> 1 2 3 4 5 6 7 >> 8 9 10 11 12 13 14 15 >> 249250621 243199373 198022430 191154276 180915260 171115067 159138663 >> 146364022 141213431 135534747 135006516 133851895 115169878 107349540 >> 102531392 >> 16 17 18 19 20 21 22 >> 90354753 81195210 78077248 59128983 63025520 48129895 51304566 >> >> All 22 chromosomes are there. >> >> How is that possible? >> >> I believe that for my data, I need to do a similar thing (I'm trying to >> create a multilayer circular plot in ggbio). Although I have snps for only >> few chromosomes, I think I need to include lengths for all of them. >> How do I do that? >> >> Best wishes, >> Agnieszka >> ________________________________________ >> From: Hervé Pagès <hpages@fhcrc.org> >> Sent: 02 May 2014 03:27 >> To: vobencha@fhcrc.org; guest@bioconductor.org; >> bioconductor@r-project.org; Miss Agnieszka Aleksandra Golicz >> Cc: GenomicRanges Maintainer >> Subject: Re: [devteam-bioc] GenomicRanges seqlengths problem >> >> On 05/01/2014 10:21 AM, Maintainer wrote: >> > Hi Sonia, Val, >> > >> > On 05/01/2014 09:52 AM, Maintainer wrote: >> >> Hi, >> >> >> >> The seqnames are in a different orders in the 'chrs' and 'sl' objects. >> >> Looking at the first 3 names in each, >> >> >> >> > head(seqlengths(chrs), 3) >> >> lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_10_v2 >> >> NA NA NA >> >> >> >> > head(sl, 3) >> >> lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_2_v2 >> >> 4258568 3378610 2939989 >> >> >> >> When a GRanges is created, seqnames are sorted in ascii order. >> > >> > Just to clarify, the seqlevels are sorted, not the seqnames: >> > >> > > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) >> > > gr >> > GRanges with 3 ranges and 0 metadata columns: >> > seqnames ranges strand >> > <rle> <iranges> <rle> >> > [1] b [1, 10] * >> > [2] a [2, 10] * >> > [3] b [3, 10] * >> > --- >> > seqlengths: >> > a b >> > NA NA >> > >> > Not sorted (i.e. user-supplied order is preserved): >> > >> > > seqnames(gr) >> > factor-Rle of length 3 with 3 runs >> > Lengths: 1 1 1 >> > Values : b a b >> > Levels(2): a b >> > >> > Sorted: >> > >> > > seqlevels(gr) >> > [1] "a" "b" >> > >> > This sorting of the seqlevels is not good anyway (people with different >> > LOCALE will get different results). I just fixed this so now the >> > GRanges() constructor will preserve the seqlevels in the order supplied >> > by the user: >> > >> > > gr <- GRanges(c("b", "a", "b"), IRanges(1:3, 10)) >> > > seqlevels(gr) >> > [1] "b" "a" >> > >> > More precisely, the seqlevels are obtained by doing unique() on the >> > seqnames. >> > >> > The fix will propagate to the public repos and become available thru >> > biocLite() in the next 24 hours. >> > >> > Then your original code should just work Sonia. >> >> Or maybe not :) You might still need to use 'stringsAsFactors=FALSE' >> when calling read.table(). Please let us know if that still doesn't >> make it. >> >> Thanks, >> H. >> >> > >> > Cheers, >> > H. >> > >> > >> >> You can >> >> see the full list with seqinfo(): >> >> >> >> > seqinfo(chrs) >> >> Seqinfo of length 34 >> >> seqnames seqlengths isCircular genome >> >> lm_SuperContig_0_v2 <na> <na> <na> >> >> lm_SuperContig_1_v2 <na> <na> <na> >> >> lm_SuperContig_10_v2 <na> <na> <na> >> >> lm_SuperContig_11_v2 <na> <na> <na> >> >> lm_SuperContig_12_v2 <na> <na> <na> >> >> ... ... ... ... >> >> >> >> The seqnames in the replacement vector must match the order in the >> >> 'Seqinfo' object. >> >> >> >> To re-order: >> >> >> >> sl_new <- sl[match(levels(factor(names(sl))), names(sl))] >> >> >> >> Then add the new lengths: >> >> >> >>>> seqlengths(chrs) <- sl_new >> >>>> chrs >> >>> GRanges with 34 ranges and 0 metadata columns: >> >>> seqnames ranges strand >> >>> <rle> <iranges> <rle> >> >>> [1] lm_SuperContig_0_v2 [1, 4258568] * >> >>> [2] lm_SuperContig_1_v2 [1, 3378610] * >> >>> [3] lm_SuperContig_2_v2 [1, 2939989] * >> >>> [4] lm_SuperContig_3_v2 [1, 2348246] * >> >>> [5] lm_SuperContig_4_v2 [1, 1918205] * >> >>> ... ... ... ... >> >>> [30] lm_SuperContig_29_v2 [1, 200940] * >> >>> [31] lm_SuperContig_30_v2 [1, 154863] * >> >>> [32] lm_SuperContig_31_v2 [1, 143268] * >> >>> [33] lm_SuperContig_32_v2 [1, 87679] * >> >>> [34] lm_SuperContig_34_v2 [1, 58596] * >> >>> --- >> >>> seqlengths: >> >>> lm_SuperContig_0_v2 lm_SuperContig_1_v2 ... >> lm_SuperContig_9_v2 >> >>> 4258568 3378610 ... >> 1772623 >> >> >> >> >> >> Valerie >> >> >> >> >> >> On 05/01/2014 07:20 AM, Maintainer wrote: >> >>> >> >>> Hello, >> >>> >> >>> I have a problem with supplying GRanges object with seqlengths. >> >>> I have a files. >> >>> chrs.txt - contains information about chromosomes >> >>> >> >>> chrs.txt >> >>> chr,start,end,len >> >>> lm_SuperContig_0_v2,1,4258568,4258568 >> >>> lm_SuperContig_1_v2,1,3378610,3378610 >> >>> lm_SuperContig_2_v2,1,2939989,2939989 >> >>> lm_SuperContig_3_v2,1,2348246,2348246 >> >>> lm_SuperContig_4_v2,1,1918205,1918205 >> >>> lm_SuperContig_6_v2,1,1888674,1888674 >> >>> lm_SuperContig_5_v2,1,1869450,1869450 >> >>> lm_SuperContig_8_v2,1,1809296,1809296 >> >>> lm_SuperContig_9_v2,1,1772623,1772623 >> >>> lm_SuperContig_7_v2,1,1769547,1769547 >> >>> lm_SuperContig_10_v2,1,1758670,1758670 >> >>> lm_SuperContig_13_v2,1,1634580,1634580 >> >>> lm_SuperContig_12_v2,1,1631710,1631710 >> >>> lm_SuperContig_11_v2,1,1590160,1590160 >> >>> lm_SuperContig_15_v2,1,1560629,1560629 >> >>> lm_SuperContig_14_v2,1,1533332,1533332 >> >>> lm_SuperContig_17_v2,1,1445693,1445693 >> >>> lm_SuperContig_16_v2,1,1397653,1397653 >> >>> lm_SuperContig_18_v2,1,1351976,1351976 >> >>> lm_SuperContig_19_v2,1,1186800,1186800 >> >>> lm_SuperContig_20_v2,1,1087932,1087932 >> >>> lm_SuperContig_21_v2,1,1020521,1020521 >> >>> lm_SuperContig_22_v2,1,731443,731443 >> >>> lm_SuperContig_23_v2,1,521426,521426 >> >>> lm_SuperContig_24_v2,1,475869,475869 >> >>> lm_SuperContig_25_v2,1,318058,318058 >> >>> lm_SuperContig_26_v2,1,261540,261540 >> >>> lm_SuperContig_27_v2,1,250629,250629 >> >>> lm_SuperContig_28_v2,1,236098,236098 >> >>> lm_SuperContig_29_v2,1,200940,200940 >> >>> lm_SuperContig_30_v2,1,154863,154863 >> >>> lm_SuperContig_31_v2,1,143268,143268 >> >>> lm_SuperContig_32_v2,1,87679,87679 >> >>> lm_SuperContig_34_v2,1,58596,58596 >> >>> >> >>> I try to do the following: >> >>> # creating GRanges object for chromosomes >> >>> dataChr <- read.table("chrs.txt",header=T,sep=",") >> >>> chrs <- with(dataChr, GRanges(chr, IRanges(start, end))) >> >>> sl <- setNames(dataChr$len, as.character(dataChr$chr)) >> >>> seqlengths(chrs) <- sl >> >>> >> >>> And I get the following error: >> >>> >> >>> Error in .normargSeqlengths(value, seqnames(x)) : >> >>> when the supplied 'seqlengths' vector is named, the names must >> match the seqnames >> >>> >> >>> Any chance you could help me with what is going on? >> >>> >> >>> Best wishes, >> >>> Agnieszka >> >>> >> >>> >> >>> -- output of sessionInfo(): >> >>> >> >>> R version 3.1.0 (2014-04-10) >> >>> Platform: i386-w64-mingw32/i386 (32-bit) >> >>> >> >>> locale: >> >>> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United >> Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 >> >>> [4] LC_NUMERIC=C LC_TIME=English_United >> Kingdom.1252 >> >>> >> >>> attached base packages: >> >>> [1] parallel stats graphics grDevices utils datasets >> methods base >> >>> >> >>> other attached packages: >> >>> [1] XVector_0.4.0 ggbio_1.12.3 ggplot2_0.9.3.1 >> GenomicRanges_1.16.2 GenomeInfoDb_1.0.2 IRanges_1.22.4 >> BiocGenerics_0.10.0 >> >>> [8] BiocInstaller_1.14.2 >> >>> >> >>> loaded via a namespace (and not attached): >> >>> [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6 >> Biobase_2.24.0 BiocParallel_0.6.0 >> biomaRt_2.20.0 >> >>> [7] Biostrings_2.32.0 biovizBase_1.12.1 >> bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 >> cluster_1.15.2 >> >>> [13] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 >> dichromat_2.0-0 digest_0.6.4 fail_1.2 >> >>> [19] foreach_1.4.2 Formula_1.1-1 >> GenomicAlignments_1.0.0 GenomicFeatures_1.16.0 grid_3.1.0 >> gridExtra_0.9.1 >> >>> [25] gtable_0.1.2 Hmisc_3.14-4 >> iterators_1.0.7 labeling_0.2 lattice_0.20-29 >> latticeExtra_0.6-26 >> >>> [31] MASS_7.3-31 munsell_0.4.2 plyr_1.8.1 >> proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.11.1 >> >>> [37] RCurl_1.95-4.1 reshape2_1.4 >> Rsamtools_1.16.0 RSQLite_0.11.4 rtracklayer_1.24.0 >> scales_0.2.4 >> >>> [43] sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 >> stringr_0.6.2 survival_2.37-7 tools_3.1.0 >> >>> [49] VariantAnnotation_1.10.0 XML_3.98-1.1 zlibbioc_1.10.0 >> >>> >> >>> >> >>> -- >> >>> Sent via the guest posting facility at bioconductor.org. >> >>> >> >>> >> ___________________________________________________________________ _____ >> >>> devteam-bioc mailing list >> >>> To unsubscribe from this mailing list send a blank email to >> >>> devteam-bioc-leave@lists.fhcrc.org >> >>> You can also unsubscribe or change your personal options at >> >>> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc >> >>> >> >> >> >> >> > >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages@fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > Tengfei Yin, PhD > Seven Bridges Genomics > sbgenomics.com > 625 Mt. Auburn St. Suite #208 > Cambridge, MA 02138 > (617) 866-0446 > -- Tengfei Yin, PhD Seven Bridges Genomics sbgenomics.com 625 Mt. Auburn St. Suite #208 Cambridge, MA 02138 (617) 866-0446 [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6