Problem loading BED file with BED2RangedData

0

Entering edit mode

José Luis Lavín ▴ 280

@jose-luis-lavin-5529

Last seen 11.3 years ago

Dear list, I'm inteterested on using the bioconductor package ChIPpeakAnno to analyze some ChIP-seq experiments. The Vignette of the package describes a series of capabilities that fit perfectly to what I intend to do for my analysis. But there's a major prblem with it, when I try to load the BED files I retrieved from peak calling programs such as MACS or SISSRS, I can't. I used BED2RangedData to upload my files but I keep getting the same error, e.g. "> BED2RangedData("macs_peaks.BED",header=FALSE) Error in BED2RangedData("macs_peaks.BED", header = FALSE) : No valid data.BED passed in, which is a data frame as BED format file with at least 3 fields in the order of: chromosome, start and end. Optional fields are name, score and strand etc. Please refer to http://genome.ucsc.edu/FAQ/FAQformat#format1 for details." I revised the UCSC format page and revised my BED files. To my surprise each so called BED file has a different format, but (at least to my understanding) should comply to BED2RangedData fields requitrements...here are my BED files first lines. 1) head macs_peaks.BED chr1 78570032 78571272 MACS_peak_1 64.10 chr1 172993763 172996043 MACS_peak_2 79.98 chr1 173009505 173012053 MACS_peak_3 113.45 chr1 189432746 189433727 MACS_peak_4 55.42 chr10 129172524 129172778 MACS_peak_5 70.83 chr11 16562386 16562670 MACS_peak_6 53.89 chr11 33101311 33101810 MACS_peak_7 53.01 chr11 33654680 33655228 MACS_peak_8 51.42 chr11 95817226 95817470 MACS_peak_9 53.89 chr11 108872843 108873542 MACS_peak_10 69.06 2) head Chip_K27.BED chr5 42860311 42860361 HSCAN:310:C1B3LACXX:7:1101:1598:1930 255 + chr7 104626327 104626377 HSCAN:310:C1B3LACXX:7:1101:1180:1956 255 + chr7 121488942 121488992 HSCAN:310:C1B3LACXX:7:1101:1294:1940 255 + chr2 98507295 98507345 HSCAN:310:C1B3LACXX:7:1101:1044:1946 255 + chr18 18078895 18078945 HSCAN:310:C1B3LACXX:7:1101:1908:1990 255 - chr8 91855040 91855090 HSCAN:310:C1B3LACXX:7:1101:2369:1954 255 + chr11 40921952 40922002 HSCAN:310:C1B3LACXX:7:1101:1988:1997 255 + chr17 44823557 44823607 HSCAN:310:C1B3LACXX:7:1101:2980:1948 255 + chr2 162682448 162682498 HSCAN:310:C1B3LACXX:7:1101:2818:1989 255 + chr9 3003254 3003304 HSCAN:310:C1B3LACXX:7:1101:1557:1996 255 + Could anyone help me with this issue, because I really don's understand whats going on here. Thanks in advance JL > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] ChIPpeakAnno_2.6.0 limma_3.14.3 [3] org.Hs.eg.db_2.8.0 GO.db_2.8.0 [5] RSQLite_0.11.2 DBI_0.2-5 [7] AnnotationDbi_1.20.3 BSgenome.Ecoli.NCBI.20080805_1.3.17 [9] BSgenome_1.26.1 GenomicRanges_1.10.2 [11] Biostrings_2.26.2 IRanges_1.16.2 [13] multtest_2.14.0 Biobase_2.18.0 [15] biomaRt_2.14.0 BiocGenerics_0.4.0 [17] VennDiagram_1.5.1 BiocInstaller_1.8.3 loaded via a namespace (and not attached): [1] MASS_7.3-22 parallel_2.15.1 RCurl_1.95-1.1 splines_2.15.1 [5] stats4_2.15.1 survival_2.36-14 tools_2.15.1 XML_3.95-0.1 -- -- Dr. José Luis Lavín Trueba Dpto. de Producción Agraria Grupo de Genética y Microbiología Universidad Pública de Navarra 31006 Pamplona Navarra SPAIN [[alternative HTML version deleted]]

GO ChIPpeakAnno GO ChIPpeakAnno • 2.0k views

ADD COMMENT • link updated 13.0 years ago by Ou, Jianhong ★ 1.3k • written 13.0 years ago by José Luis Lavín ▴ 280

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 4.0 years ago

United States

The rtracklayer package is another option for parsing BED files into RangedData objects. No idea if ChIPPeakAnno has specific expectations about the form of the RangedData.. Michael On Mon, Dec 3, 2012 at 12:57 AM, José Luis Lavín <jluis.lavin@unavarra.es>wrote: > Dear list, > > I'm inteterested on using the bioconductor package ChIPpeakAnno to analyze > some ChIP-seq experiments. The Vignette of the package describes a series > of capabilities that fit perfectly to what I intend to do for my analysis. > But there's a major prblem with it, when I try to load the BED files I > retrieved from peak calling programs such as MACS or SISSRS, I can't. > I used BED2RangedData to upload my files but I keep getting the same error, > e.g. > > "> BED2RangedData("macs_peaks.BED",header=FALSE) > > Error in BED2RangedData("macs_peaks.BED", header = FALSE) : > No valid data.BED passed in, which is a data frame as BED format file with > at least 3 fields in the order of: chromosome, start and end. Optional > fields are name, score and strand etc. Please refer to > http://genome.ucsc.edu/FAQ/FAQformat#format1 for details." > > I revised the UCSC format page and revised my BED files. To my surprise > each so called BED file has a different format, but (at least to my > understanding) should comply to BED2RangedData fields requitrements...here > are my BED files first lines. > > 1) head macs_peaks.BED > > chr1 78570032 78571272 MACS_peak_1 64.10 > chr1 172993763 172996043 MACS_peak_2 79.98 > chr1 173009505 173012053 MACS_peak_3 113.45 > chr1 189432746 189433727 MACS_peak_4 55.42 > chr10 129172524 129172778 MACS_peak_5 70.83 > chr11 16562386 16562670 MACS_peak_6 53.89 > chr11 33101311 33101810 MACS_peak_7 53.01 > chr11 33654680 33655228 MACS_peak_8 51.42 > chr11 95817226 95817470 MACS_peak_9 53.89 > chr11 108872843 108873542 MACS_peak_10 69.06 > > 2) head Chip_K27.BED > > chr5 42860311 42860361 > HSCAN:310:C1B3LACXX:7:1101:1598:1930 255 + > chr7 104626327 104626377 > HSCAN:310:C1B3LACXX:7:1101:1180:1956 255 + > chr7 121488942 121488992 > HSCAN:310:C1B3LACXX:7:1101:1294:1940 255 + > chr2 98507295 98507345 > HSCAN:310:C1B3LACXX:7:1101:1044:1946 255 + > chr18 18078895 18078945 > HSCAN:310:C1B3LACXX:7:1101:1908:1990 255 - > chr8 91855040 91855090 > HSCAN:310:C1B3LACXX:7:1101:2369:1954 255 + > chr11 40921952 40922002 > HSCAN:310:C1B3LACXX:7:1101:1988:1997 255 + > chr17 44823557 44823607 > HSCAN:310:C1B3LACXX:7:1101:2980:1948 255 + > chr2 162682448 162682498 > HSCAN:310:C1B3LACXX:7:1101:2818:1989 255 + > chr9 3003254 3003304 > HSCAN:310:C1B3LACXX:7:1101:1557:1996 255 + > > Could anyone help me with this issue, because I really don's understand > whats going on here. > > Thanks in advance > > JL > > > sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] ChIPpeakAnno_2.6.0 limma_3.14.3 > [3] org.Hs.eg.db_2.8.0 GO.db_2.8.0 > [5] RSQLite_0.11.2 DBI_0.2-5 > [7] AnnotationDbi_1.20.3 > BSgenome.Ecoli.NCBI.20080805_1.3.17 > [9] BSgenome_1.26.1 GenomicRanges_1.10.2 > [11] Biostrings_2.26.2 IRanges_1.16.2 > [13] multtest_2.14.0 Biobase_2.18.0 > [15] biomaRt_2.14.0 BiocGenerics_0.4.0 > [17] VennDiagram_1.5.1 BiocInstaller_1.8.3 > > loaded via a namespace (and not attached): > [1] MASS_7.3-22 parallel_2.15.1 RCurl_1.95-1.1 splines_2.15.1 > [5] stats4_2.15.1 survival_2.36-14 tools_2.15.1 XML_3.95-0.1 > > > -- > -- > Dr. José Luis Lavín Trueba > > Dpto. de Producción Agraria > Grupo de Genética y Microbiología > Universidad Pública de Navarra > 31006 Pamplona > Navarra > SPAIN > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 13.0 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thank you very much for the insight Michael. 2012/12/3 Michael Lawrence <lawrence.michael@gene.com> > The rtracklayer package is another option for parsing BED files into > RangedData objects. No idea if ChIPPeakAnno has specific expectations about > the form of the RangedData.. > > Michael > > > On Mon, Dec 3, 2012 at 12:57 AM, José Luis Lavín <jluis.lavin@unavarra.es>wrote: > >> Dear list, >> >> I'm inteterested on using the bioconductor package ChIPpeakAnno to analyze >> some ChIP-seq experiments. The Vignette of the package describes a series >> of capabilities that fit perfectly to what I intend to do for my analysis. >> But there's a major prblem with it, when I try to load the BED files I >> retrieved from peak calling programs such as MACS or SISSRS, I can't. >> I used BED2RangedData to upload my files but I keep getting the same >> error, >> e.g. >> >> "> BED2RangedData("macs_peaks.BED",header=FALSE) >> >> Error in BED2RangedData("macs_peaks.BED", header = FALSE) : >> No valid data.BED passed in, which is a data frame as BED format file with >> at least 3 fields in the order of: chromosome, start and end. Optional >> fields are name, score and strand etc. Please refer to >> http://genome.ucsc.edu/FAQ/FAQformat#format1 for details." >> >> I revised the UCSC format page and revised my BED files. To my surprise >> each so called BED file has a different format, but (at least to my >> understanding) should comply to BED2RangedData fields requitrements...here >> are my BED files first lines. >> >> 1) head macs_peaks.BED >> >> chr1 78570032 78571272 MACS_peak_1 64.10 >> chr1 172993763 172996043 MACS_peak_2 79.98 >> chr1 173009505 173012053 MACS_peak_3 113.45 >> chr1 189432746 189433727 MACS_peak_4 55.42 >> chr10 129172524 129172778 MACS_peak_5 70.83 >> chr11 16562386 16562670 MACS_peak_6 53.89 >> chr11 33101311 33101810 MACS_peak_7 53.01 >> chr11 33654680 33655228 MACS_peak_8 51.42 >> chr11 95817226 95817470 MACS_peak_9 53.89 >> chr11 108872843 108873542 MACS_peak_10 69.06 >> >> 2) head Chip_K27.BED >> >> chr5 42860311 42860361 >> HSCAN:310:C1B3LACXX:7:1101:1598:1930 255 + >> chr7 104626327 104626377 >> HSCAN:310:C1B3LACXX:7:1101:1180:1956 255 + >> chr7 121488942 121488992 >> HSCAN:310:C1B3LACXX:7:1101:1294:1940 255 + >> chr2 98507295 98507345 >> HSCAN:310:C1B3LACXX:7:1101:1044:1946 255 + >> chr18 18078895 18078945 >> HSCAN:310:C1B3LACXX:7:1101:1908:1990 255 - >> chr8 91855040 91855090 >> HSCAN:310:C1B3LACXX:7:1101:2369:1954 255 + >> chr11 40921952 40922002 >> HSCAN:310:C1B3LACXX:7:1101:1988:1997 255 + >> chr17 44823557 44823607 >> HSCAN:310:C1B3LACXX:7:1101:2980:1948 255 + >> chr2 162682448 162682498 >> HSCAN:310:C1B3LACXX:7:1101:2818:1989 255 + >> chr9 3003254 3003304 >> HSCAN:310:C1B3LACXX:7:1101:1557:1996 255 + >> >> Could anyone help me with this issue, because I really don's understand >> whats going on here. >> >> Thanks in advance >> >> JL >> >> > sessionInfo() >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-redhat-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] grid stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] ChIPpeakAnno_2.6.0 limma_3.14.3 >> [3] org.Hs.eg.db_2.8.0 GO.db_2.8.0 >> [5] RSQLite_0.11.2 DBI_0.2-5 >> [7] AnnotationDbi_1.20.3 >> BSgenome.Ecoli.NCBI.20080805_1.3.17 >> [9] BSgenome_1.26.1 GenomicRanges_1.10.2 >> [11] Biostrings_2.26.2 IRanges_1.16.2 >> [13] multtest_2.14.0 Biobase_2.18.0 >> [15] biomaRt_2.14.0 BiocGenerics_0.4.0 >> [17] VennDiagram_1.5.1 BiocInstaller_1.8.3 >> >> loaded via a namespace (and not attached): >> [1] MASS_7.3-22 parallel_2.15.1 RCurl_1.95-1.1 splines_2.15.1 >> [5] stats4_2.15.1 survival_2.36-14 tools_2.15.1 XML_3.95-0.1 >> >> >> -- >> -- >> Dr. José Luis Lavín Trueba >> >> Dpto. de Producción Agraria >> Grupo de Genética y Microbiología >> Universidad Pública de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- -- Dr. José Luis Lavín Trueba Dpto. de Producción Agraria Grupo de Genética y Microbiología Universidad Pública de Navarra 31006 Pamplona Navarra SPAIN [[alternative HTML version deleted]]

ADD REPLY • link 13.0 years ago José Luis Lavín ▴ 280

0

Entering edit mode

Ou, Jianhong ★ 1.3k

@ou-jianhong-4539

Last seen 5 weeks ago

United States

Hi Dr. Jose, The parameter data.BED should be an object of data.frame but not a file name. You could try: macs.peaks.bed <- read.delim("macs_peaks.BED", header=FALSE) BED2RangedData(macs.peaks.bed) Yours sincerely, Jianhong Ou jianhong.ou at umassmed.edu On Dec 3, 2012, at 3:57 AM, Jos? Luis Lav?n wrote: > Dear list, > > I'm inteterested on using the bioconductor package ChIPpeakAnno to analyze > some ChIP-seq experiments. The Vignette of the package describes a series > of capabilities that fit perfectly to what I intend to do for my analysis. > But there's a major prblem with it, when I try to load the BED files I > retrieved from peak calling programs such as MACS or SISSRS, I can't. > I used BED2RangedData to upload my files but I keep getting the same error, > e.g. > > "> BED2RangedData("macs_peaks.BED",header=FALSE) > > Error in BED2RangedData("macs_peaks.BED", header = FALSE) : > No valid data.BED passed in, which is a data frame as BED format file with > at least 3 fields in the order of: chromosome, start and end. Optional > fields are name, score and strand etc. Please refer to > http://genome.ucsc.edu/FAQ/FAQformat#format1 for details." > > I revised the UCSC format page and revised my BED files. To my surprise > each so called BED file has a different format, but (at least to my > understanding) should comply to BED2RangedData fields requitrements...here > are my BED files first lines. > > 1) head macs_peaks.BED > > chr1 78570032 78571272 MACS_peak_1 64.10 > chr1 172993763 172996043 MACS_peak_2 79.98 > chr1 173009505 173012053 MACS_peak_3 113.45 > chr1 189432746 189433727 MACS_peak_4 55.42 > chr10 129172524 129172778 MACS_peak_5 70.83 > chr11 16562386 16562670 MACS_peak_6 53.89 > chr11 33101311 33101810 MACS_peak_7 53.01 > chr11 33654680 33655228 MACS_peak_8 51.42 > chr11 95817226 95817470 MACS_peak_9 53.89 > chr11 108872843 108873542 MACS_peak_10 69.06 > > 2) head Chip_K27.BED > > chr5 42860311 42860361 > HSCAN:310:C1B3LACXX:7:1101:1598:1930 255 + > chr7 104626327 104626377 > HSCAN:310:C1B3LACXX:7:1101:1180:1956 255 + > chr7 121488942 121488992 > HSCAN:310:C1B3LACXX:7:1101:1294:1940 255 + > chr2 98507295 98507345 > HSCAN:310:C1B3LACXX:7:1101:1044:1946 255 + > chr18 18078895 18078945 > HSCAN:310:C1B3LACXX:7:1101:1908:1990 255 - > chr8 91855040 91855090 > HSCAN:310:C1B3LACXX:7:1101:2369:1954 255 + > chr11 40921952 40922002 > HSCAN:310:C1B3LACXX:7:1101:1988:1997 255 + > chr17 44823557 44823607 > HSCAN:310:C1B3LACXX:7:1101:2980:1948 255 + > chr2 162682448 162682498 > HSCAN:310:C1B3LACXX:7:1101:2818:1989 255 + > chr9 3003254 3003304 > HSCAN:310:C1B3LACXX:7:1101:1557:1996 255 + > > Could anyone help me with this issue, because I really don's understand > whats going on here. > > Thanks in advance > > JL > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] ChIPpeakAnno_2.6.0 limma_3.14.3 > [3] org.Hs.eg.db_2.8.0 GO.db_2.8.0 > [5] RSQLite_0.11.2 DBI_0.2-5 > [7] AnnotationDbi_1.20.3 BSgenome.Ecoli.NCBI.20080805_1.3.17 > [9] BSgenome_1.26.1 GenomicRanges_1.10.2 > [11] Biostrings_2.26.2 IRanges_1.16.2 > [13] multtest_2.14.0 Biobase_2.18.0 > [15] biomaRt_2.14.0 BiocGenerics_0.4.0 > [17] VennDiagram_1.5.1 BiocInstaller_1.8.3 > > loaded via a namespace (and not attached): > [1] MASS_7.3-22 parallel_2.15.1 RCurl_1.95-1.1 splines_2.15.1 > [5] stats4_2.15.1 survival_2.36-14 tools_2.15.1 XML_3.95-0.1 > > > -- > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 13.0 years ago Ou, Jianhong ★ 1.3k

0

Entering edit mode

Hello Jianhong Ou, Thank you very much for your advice, I'm quite new on R and still make such basic mistakes... With best wishes JL 2012/12/3 Ou, Jianhong <jianhong.ou@umassmed.edu> > Hi Dr. Jose, > > The parameter data.BED should be an object of data.frame but not a file > name. You could try: > > macs.peaks.bed <- read.delim("macs_peaks.BED", header=FALSE) > BED2RangedData(macs.peaks.bed) > > Yours sincerely, > > Jianhong Ou > > jianhong.ou@umassmed.edu > > > On Dec 3, 2012, at 3:57 AM, José Luis Lavín wrote: > > > Dear list, > > > > I'm inteterested on using the bioconductor package ChIPpeakAnno to > analyze > > some ChIP-seq experiments. The Vignette of the package describes a series > > of capabilities that fit perfectly to what I intend to do for my > analysis. > > But there's a major prblem with it, when I try to load the BED files I > > retrieved from peak calling programs such as MACS or SISSRS, I can't. > > I used BED2RangedData to upload my files but I keep getting the same > error, > > e.g. > > > > "> BED2RangedData("macs_peaks.BED",header=FALSE) > > > > Error in BED2RangedData("macs_peaks.BED", header = FALSE) : > > No valid data.BED passed in, which is a data frame as BED format file > with > > at least 3 fields in the order of: chromosome, start and end. Optional > > fields are name, score and strand etc. Please refer to > > http://genome.ucsc.edu/FAQ/FAQformat#format1 for details." > > > > I revised the UCSC format page and revised my BED files. To my surprise > > each so called BED file has a different format, but (at least to my > > understanding) should comply to BED2RangedData fields > requitrements...here > > are my BED files first lines. > > > > 1) head macs_peaks.BED > > > > chr1 78570032 78571272 MACS_peak_1 64.10 > > chr1 172993763 172996043 MACS_peak_2 79.98 > > chr1 173009505 173012053 MACS_peak_3 113.45 > > chr1 189432746 189433727 MACS_peak_4 55.42 > > chr10 129172524 129172778 MACS_peak_5 70.83 > > chr11 16562386 16562670 MACS_peak_6 53.89 > > chr11 33101311 33101810 MACS_peak_7 53.01 > > chr11 33654680 33655228 MACS_peak_8 51.42 > > chr11 95817226 95817470 MACS_peak_9 53.89 > > chr11 108872843 108873542 MACS_peak_10 69.06 > > > > 2) head Chip_K27.BED > > > > chr5 42860311 42860361 > > HSCAN:310:C1B3LACXX:7:1101:1598:1930 255 + > > chr7 104626327 104626377 > > HSCAN:310:C1B3LACXX:7:1101:1180:1956 255 + > > chr7 121488942 121488992 > > HSCAN:310:C1B3LACXX:7:1101:1294:1940 255 + > > chr2 98507295 98507345 > > HSCAN:310:C1B3LACXX:7:1101:1044:1946 255 + > > chr18 18078895 18078945 > > HSCAN:310:C1B3LACXX:7:1101:1908:1990 255 - > > chr8 91855040 91855090 > > HSCAN:310:C1B3LACXX:7:1101:2369:1954 255 + > > chr11 40921952 40922002 > > HSCAN:310:C1B3LACXX:7:1101:1988:1997 255 + > > chr17 44823557 44823607 > > HSCAN:310:C1B3LACXX:7:1101:2980:1948 255 + > > chr2 162682448 162682498 > > HSCAN:310:C1B3LACXX:7:1101:2818:1989 255 + > > chr9 3003254 3003304 > > HSCAN:310:C1B3LACXX:7:1101:1557:1996 255 + > > > > Could anyone help me with this issue, because I really don's understand > > whats going on here. > > > > Thanks in advance > > > > JL > > > >> sessionInfo() > > R version 2.15.1 (2012-06-22) > > Platform: x86_64-redhat-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=C LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] grid stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] ChIPpeakAnno_2.6.0 limma_3.14.3 > > [3] org.Hs.eg.db_2.8.0 GO.db_2.8.0 > > [5] RSQLite_0.11.2 DBI_0.2-5 > > [7] AnnotationDbi_1.20.3 > BSgenome.Ecoli.NCBI.20080805_1.3.17 > > [9] BSgenome_1.26.1 GenomicRanges_1.10.2 > > [11] Biostrings_2.26.2 IRanges_1.16.2 > > [13] multtest_2.14.0 Biobase_2.18.0 > > [15] biomaRt_2.14.0 BiocGenerics_0.4.0 > > [17] VennDiagram_1.5.1 BiocInstaller_1.8.3 > > > > loaded via a namespace (and not attached): > > [1] MASS_7.3-22 parallel_2.15.1 RCurl_1.95-1.1 splines_2.15.1 > > [5] stats4_2.15.1 survival_2.36-14 tools_2.15.1 XML_3.95-0.1 > > > > > > -- > > -- > > Dr. José Luis Lavín Trueba > > > > Dpto. de Producción Agraria > > Grupo de Genética y Microbiología > > Universidad Pública de Navarra > > 31006 Pamplona > > Navarra > > SPAIN > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- -- Dr. José Luis Lavín Trueba Dpto. de Producción Agraria Grupo de Genética y Microbiología Universidad Pública de Navarra 31006 Pamplona Navarra SPAIN [[alternative HTML version deleted]]

ADD REPLY • link 13.0 years ago José Luis Lavín ▴ 280

0

Entering edit mode

Hi, We are trying to add several lines of code to make this clear to users. Communication is always good and then we can improve the user experience. Thanks and good luck. Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Dec 3, 2012, at 10:18 AM, José Luis Lavín wrote: Hello Jianhong Ou, Thank you very much for your advice, I'm quite new on R and still make such basic mistakes... With best wishes JL 2012/12/3 Ou, Jianhong <jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu>> Hi Dr. Jose, The parameter data.BED should be an object of data.frame but not a file name. You could try: macs.peaks.bed <- read.delim("macs_peaks.BED", header=FALSE) BED2RangedData(macs.peaks.bed) Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Dec 3, 2012, at 3:57 AM, José Luis Lavín wrote: > Dear list, > > I'm inteterested on using the bioconductor package ChIPpeakAnno to analyze > some ChIP-seq experiments. The Vignette of the package describes a series > of capabilities that fit perfectly to what I intend to do for my analysis. > But there's a major prblem with it, when I try to load the BED files I > retrieved from peak calling programs such as MACS or SISSRS, I can't. > I used BED2RangedData to upload my files but I keep getting the same error, > e.g. > > "> BED2RangedData("macs_peaks.BED",header=FALSE) > > Error in BED2RangedData("macs_peaks.BED", header = FALSE) : > No valid data.BED passed in, which is a data frame as BED format file with > at least 3 fields in the order of: chromosome, start and end. Optional > fields are name, score and strand etc. Please refer to > http://genome.ucsc.edu/FAQ/FAQformat#format1 for details." > > I revised the UCSC format page and revised my BED files. To my surprise > each so called BED file has a different format, but (at least to my > understanding) should comply to BED2RangedData fields requitrements...here > are my BED files first lines. > > 1) head macs_peaks.BED > > chr1 78570032 78571272 MACS_peak_1 64.10 > chr1 172993763 172996043 MACS_peak_2 79.98 > chr1 173009505 173012053 MACS_peak_3 113.45 > chr1 189432746 189433727 MACS_peak_4 55.42 > chr10 129172524 129172778 MACS_peak_5 70.83 > chr11 16562386 16562670 MACS_peak_6 53.89 > chr11 33101311 33101810 MACS_peak_7 53.01 > chr11 33654680 33655228 MACS_peak_8 51.42 > chr11 95817226 95817470 MACS_peak_9 53.89 > chr11 108872843 108873542 MACS_peak_10 69.06 > > 2) head Chip_K27.BED > > chr5 42860311 42860361 > HSCAN:310:C1B3LACXX:7:1101:1598:1930 255 + > chr7 104626327 104626377 > HSCAN:310:C1B3LACXX:7:1101:1180:1956 255 + > chr7 121488942 121488992 > HSCAN:310:C1B3LACXX:7:1101:1294:1940 255 + > chr2 98507295 98507345 > HSCAN:310:C1B3LACXX:7:1101:1044:1946 255 + > chr18 18078895 18078945 > HSCAN:310:C1B3LACXX:7:1101:1908:1990 255 - > chr8 91855040 91855090 > HSCAN:310:C1B3LACXX:7:1101:2369:1954 255 + > chr11 40921952 40922002 > HSCAN:310:C1B3LACXX:7:1101:1988:1997 255 + > chr17 44823557 44823607 > HSCAN:310:C1B3LACXX:7:1101:2980:1948 255 + > chr2 162682448 162682498 > HSCAN:310:C1B3LACXX:7:1101:2818:1989 255 + > chr9 3003254 3003304 > HSCAN:310:C1B3LACXX:7:1101:1557:1996 255 + > > Could anyone help me with this issue, because I really don's understand > whats going on here. > > Thanks in advance > > JL > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] ChIPpeakAnno_2.6.0 limma_3.14.3 > [3] org.Hs.eg.db_2.8.0 GO.db_2.8.0 > [5] RSQLite_0.11.2 DBI_0.2-5 > [7] AnnotationDbi_1.20.3 BSgenome.Ecoli.NCBI.20080805_1.3.17 > [9] BSgenome_1.26.1 GenomicRanges_1.10.2 > [11] Biostrings_2.26.2 IRanges_1.16.2 > [13] multtest_2.14.0 Biobase_2.18.0 > [15] biomaRt_2.14.0 BiocGenerics_0.4.0 > [17] VennDiagram_1.5.1 BiocInstaller_1.8.3 > > loaded via a namespace (and not attached): > [1] MASS_7.3-22 parallel_2.15.1 RCurl_1.95-1.1 splines_2.15.1 > [5] stats4_2.15.1 survival_2.36-14 tools_2.15.1 XML_3.95-0.1 > > > -- > -- > Dr. José Luis Lavín Trueba > > Dpto. de Producción Agraria > Grupo de Genética y Microbiología > Universidad Pública de Navarra > 31006 Pamplona > Navarra > SPAIN > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org<mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- -- Dr. José Luis Lavín Trueba Dpto. de Producción Agraria Grupo de Genética y Microbiología Universidad Pública de Navarra 31006 Pamplona Navarra SPAIN [[alternative HTML version deleted]]

ADD REPLY • link 13.0 years ago Ou, Jianhong ★ 1.3k

0

Entering edit mode

Ou, Jianhong ★ 1.3k

@ou-jianhong-4539

Last seen 5 weeks ago

United States

Hi Jose, You can get the basic idea about the space in RangedData by ?RangedData. And usually it should be the chromosome name. To get the sequences, you can try, source("http://bioconductor.org/biocLite.R") biocLite("BSgenome.Mmusculus.UCSC.mm10") library(BSgenome.Mmusculus.UCSC.mm10) peaksWithSequences = getAllPeakSequence(peaks, upstream = 20,downstream = 20, genome = Mmusculus) write2FASTA(peaksWithSequences,"mouse_test.fa") Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Dec 3, 2012, at 11:42 AM, José Luis Lavín wrote: Hi, Your package is awesome, but I get stuck in steps 2.3 (sequences surrounding the peaks) and 2.4 (GO terms) You suddenly shift to E.coli genome and I lose track of the peaks objects I was using up to that step...what does exactly mean this part of the code (in bold letters): peaks = RangedData(IRanges(start=c(100, 500), end=c(300, 600), names=c("myPeakList", "myPeakList2")),space=c("NC_008253", "NC_010468") ) Are those feature names of the zone I should wish to get the surrounding sequences? Can I just change those features IDs for my Mus musculus ones in order to perform the operation? Would these changes be correct (actually those don't work)? library(TSS.mouse.NCBIM37) peaksWithSequences = getAllPeakSequence(peaks, upstream = 20,downstream = 20, genome = mouse) write2FASTA(peaksWithSequences,"mouse_test.fa") Please forgive me if I'm asking very basic questions, but I couldn't manage to use the package any further from this point and I'd like to use it for some of my analysis. Best wishes and thanks again. JL 2012/12/3 Ou, Jianhong <jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu>> Hi, We are trying to add several lines of code to make this clear to users. Communication is always good and then we can improve the user experience. Thanks and good luck. Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Dec 3, 2012, at 10:18 AM, José Luis Lavín wrote: Hello Jianhong Ou, Thank you very much for your advice, I'm quite new on R and still make such basic mistakes... With best wishes JL 2012/12/3 Ou, Jianhong <jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu>> Hi Dr. Jose, The parameter data.BED should be an object of data.frame but not a file name. You could try: macs.peaks.bed <- read.delim("macs_peaks.BED", header=FALSE) BED2RangedData(macs.peaks.bed) Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Dec 3, 2012, at 3:57 AM, José Luis Lavín wrote: > Dear list, > > I'm inteterested on using the bioconductor package ChIPpeakAnno to analyze > some ChIP-seq experiments. The Vignette of the package describes a series > of capabilities that fit perfectly to what I intend to do for my analysis. > But there's a major prblem with it, when I try to load the BED files I > retrieved from peak calling programs such as MACS or SISSRS, I can't. > I used BED2RangedData to upload my files but I keep getting the same error, > e.g. > > "> BED2RangedData("macs_peaks.BED",header=FALSE) > > Error in BED2RangedData("macs_peaks.BED", header = FALSE) : > No valid data.BED passed in, which is a data frame as BED format file with > at least 3 fields in the order of: chromosome, start and end. Optional > fields are name, score and strand etc. Please refer to > http://genome.ucsc.edu/FAQ/FAQformat#format1 for details." > > I revised the UCSC format page and revised my BED files. To my surprise > each so called BED file has a different format, but (at least to my > understanding) should comply to BED2RangedData fields requitrements...here > are my BED files first lines. > > 1) head macs_peaks.BED > > chr1 78570032 78571272 MACS_peak_1 64.10 > chr1 172993763 172996043 MACS_peak_2 79.98 > chr1 173009505 173012053 MACS_peak_3 113.45 > chr1 189432746 189433727 MACS_peak_4 55.42 > chr10 129172524 129172778 MACS_peak_5 70.83 > chr11 16562386 16562670 MACS_peak_6 53.89 > chr11 33101311 33101810 MACS_peak_7 53.01 > chr11 33654680 33655228 MACS_peak_8 51.42 > chr11 95817226 95817470 MACS_peak_9 53.89 > chr11 108872843 108873542 MACS_peak_10 69.06 > > 2) head Chip_K27.BED > > chr5 42860311 42860361 > HSCAN:310:C1B3LACXX:7:1101:1598:1930 255 + > chr7 104626327 104626377 > HSCAN:310:C1B3LACXX:7:1101:1180:1956 255 + > chr7 121488942 121488992 > HSCAN:310:C1B3LACXX:7:1101:1294:1940 255 + > chr2 98507295 98507345 > HSCAN:310:C1B3LACXX:7:1101:1044:1946 255 + > chr18 18078895 18078945 > HSCAN:310:C1B3LACXX:7:1101:1908:1990 255 - > chr8 91855040 91855090 > HSCAN:310:C1B3LACXX:7:1101:2369:1954 255 + > chr11 40921952 40922002 > HSCAN:310:C1B3LACXX:7:1101:1988:1997 255 + > chr17 44823557 44823607 > HSCAN:310:C1B3LACXX:7:1101:2980:1948 255 + > chr2 162682448 162682498 > HSCAN:310:C1B3LACXX:7:1101:2818:1989 255 + > chr9 3003254 3003304 > HSCAN:310:C1B3LACXX:7:1101:1557:1996 255 + > > Could anyone help me with this issue, because I really don's understand > whats going on here. > > Thanks in advance > > JL > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] ChIPpeakAnno_2.6.0 limma_3.14.3 > [3] org.Hs.eg.db_2.8.0 GO.db_2.8.0 > [5] RSQLite_0.11.2 DBI_0.2-5 > [7] AnnotationDbi_1.20.3 BSgenome.Ecoli.NCBI.20080805_1.3.17 > [9] BSgenome_1.26.1 GenomicRanges_1.10.2 > [11] Biostrings_2.26.2 IRanges_1.16.2 > [13] multtest_2.14.0 Biobase_2.18.0 > [15] biomaRt_2.14.0 BiocGenerics_0.4.0 > [17] VennDiagram_1.5.1 BiocInstaller_1.8.3 > > loaded via a namespace (and not attached): > [1] MASS_7.3-22 parallel_2.15.1 RCurl_1.95-1.1 splines_2.15.1 > [5] stats4_2.15.1 survival_2.36-14 tools_2.15.1 XML_3.95-0.1 > > > -- > -- > Dr. José Luis Lavín Trueba > > Dpto. de Producción Agraria > Grupo de Genética y Microbiología > Universidad Pública de Navarra > 31006 Pamplona > Navarra > SPAIN > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org<mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- -- Dr. José Luis Lavín Trueba Dpto. de Producción Agraria Grupo de Genética y Microbiología Universidad Pública de Navarra 31006 Pamplona Navarra SPAIN -- -- Dr. José Luis Lavín Trueba Dpto. de Producción Agraria Grupo de Genética y Microbiología Universidad Pública de Navarra 31006 Pamplona Navarra SPAIN [[alternative HTML version deleted]]

ADD COMMENT • link 13.0 years ago Ou, Jianhong ★ 1.3k

0

Entering edit mode

Thank you very much for your kind explanations Jianhong Ou. ;) 2012/12/3 Ou, Jianhong <jianhong.ou@umassmed.edu> > Jianhong Ou -- -- Dr. José Luis Lavín Trueba Dpto. de Producción Agraria Grupo de Genética y Microbiología Universidad Pública de Navarra 31006 Pamplona Navarra SPAIN [[alternative HTML version deleted]]

ADD REPLY • link 13.0 years ago José Luis Lavín ▴ 280

Login before adding your answer.