Question

Problem converting a bed file to granges using togranges chippeakanno

0

Entering edit mode

Bakhtiyar • 0

@6f97db95

Last seen 3.8 years ago

Canada

I apologize if this is very easy question but here I go. When I try to convert a bed file to granges using togranges I am getting this error. My bed file looks like this, it is a bedfile from MACS2 in Galaxy:

chr1 858727 858893 split_on_avi_rep1_peak_1 54 . 5.63956 8.06901 5.42788 64


> library(ChIPpeakAnno)
> bed <- system.file("extdata", "overlap_flag_coordinates_with_r_duplicates_removed.bed", package="ChIPpeakAnno")
> gr1 <- toGRanges(bed, format="BED", header=FALSE)
Error in .new_IRanges_from_start_end(start, end) : 
  'start' or 'end' cannot contain NAs
In addition: Warning message:
In toGRanges(bed, format = "BED", header = FALSE) :
  NAs introduced by coercion
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rtracklayer_1.52.1   regioneR_1.24.0      ChIPpeakAnno_3.26.4  GenomicRanges_1.44.0 GenomeInfoDb_1.28.4 
[6] IRanges_2.26.0       S4Vectors_0.30.2     BiocGenerics_0.38.0  BiocManager_1.30.16 

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.24.0         bitops_1.0-7                matrixStats_0.61.0          bit64_4.0.5                
 [5] filelock_1.0.2              progress_1.2.2              httr_1.4.2                  InteractionSet_1.20.0      
 [9] tools_4.1.1                 utf8_1.2.2                  R6_2.5.1                    colorspace_2.0-2           
[13] DBI_1.1.2                   lazyeval_0.2.2              tidyselect_1.1.1            prettyunits_1.1.1          
[17] bit_4.0.4                   curl_4.3.2                  compiler_4.1.1              VennDiagram_1.7.1          
[21] graph_1.70.0                cli_3.1.1                   Biobase_2.52.0              formatR_1.11               
[25] xml2_1.3.3                  DelayedArray_0.18.0         scales_1.1.1                RBGL_1.68.0                
[29] rappdirs_0.3.3              stringr_1.4.0               digest_0.6.29               Rsamtools_2.8.0            
[33] XVector_0.32.0              pkgconfig_2.0.3             MatrixGenerics_1.4.3        BSgenome_1.60.0            
[37] dbplyr_2.1.1                fastmap_1.1.0               ensembldb_2.16.4            rlang_1.0.0                
[41] rstudioapi_0.13             RSQLite_2.2.9               BiocIO_1.2.0                generics_0.1.2             
[45] BiocParallel_1.26.2         dplyr_1.0.7                 RCurl_1.98-1.5              magrittr_2.0.2             
[49] GenomeInfoDbData_1.2.6      futile.logger_1.4.3         Matrix_1.3-4                munsell_0.5.0              
[53] Rcpp_1.0.8                  fansi_1.0.2                 lifecycle_1.0.1             stringi_1.7.6              
[57] yaml_2.2.2                  MASS_7.3-54                 SummarizedExperiment_1.22.0 zlibbioc_1.38.0            
[61] BiocFileCache_2.0.0         grid_4.1.1                  blob_1.2.2                  crayon_1.4.2               
[65] lattice_0.20-44             splines_4.1.1               Biostrings_2.60.2           multtest_2.48.0            
[69] GenomicFeatures_1.44.2      hms_1.1.1                   KEGGREST_1.32.0             pillar_1.7.0               
[73] rjson_0.2.21                biomaRt_2.48.3              futile.options_1.0.1        XML_3.99-0.8               
[77] glue_1.6.1                  lambda.r_1.2.4              png_0.1-7                   vctrs_0.3.8                
[81] gtable_0.3.0                purrr_0.3.4                 assertthat_0.2.1            cachem_1.0.6               
[85] ggplot2_3.3.5               restfulr_0.0.13             AnnotationFilter_1.16.0     survival_3.2-11            
[89] tibble_3.1.6                GenomicAlignments_1.28.0    AnnotationDbi_1.54.1        memoise_2.0.1

ChIPpeakAnno • 3.8k views

ADD COMMENT • link updated 3.9 years ago by Kai Hu ▴ 70 • written 3.9 years ago by Bakhtiyar • 0

0

Entering edit mode

Kai Hu ▴ 70

@kai

Last seen 19 months ago

Worcester

According to the error message, some records in your bed file contains "NA" in either "start (column2)" or "end (column3)". You can choose to filter your bed file first with grep command or like. It will help if you can post your bed file.

ADD COMMENT • link 3.9 years ago Kai Hu ▴ 70

0

Entering edit mode

Hello,

Thanks for your response. Here is a link that should take you to my bed file. Let me know if it is accessible from your end.

http://18.217.237.111/overlap_flag_coordinates_with_r_duplicates_removed.bed

ADD REPLY • link 3.9 years ago Bakhtiyar • 0

score 1 · Accepted Answer · 2022-02-04

1

Entering edit mode

Bakhtiyar • 0

@6f97db95

Last seen 3.8 years ago

Canada

When I want to look at the content of the bed file using cat() this is the output:


> cat(bed)
C:/Users/brodr/OneDrive - York University/Bakhtiyar/Project outlines, ideas, creative efforts/H2A.Z Chip-seq files data managment/chip_project_R/chippeakanno/overlap_flag_coordinates_with_r_duplicates_removed.bed

Could this be the problem of how I am trying to read the bed file?

ADD COMMENT • link 3.9 years ago Bakhtiyar • 0

0

Entering edit mode

I think I solved the issue by using a different way to read the bed file then torganges has no problem.


> bed <- as.data.frame(read.table("overlap_flag_coordinates_with_r_duplicates_removed.bed",header = FALSE, sep="\t",stringsAsFactors=FALSE, quote=""))
> cat(bed)
Error in cat(bed) : argument 1 (type 'list') cannot be handled by 'cat'
> print(bed)
      V1      V2      V3                           V4  V5 V6      V7       V8       V9  V10
1   chr1  903734  904216   split_on_flag_rep2_peak_15  42  . 4.59529  6.68330  4.24517   64
2   chr1  912653  912938   split_on_flag_rep2_peak_22  27  . 3.29425  4.79185  2.77313  189
3   chr1  913579  913927   split_on_flag_rep2_peak_23  38  . 4.80347  6.12447  3.80320  244
4   chr1  916913  917442   split_on_flag_rep2_peak_26  50  . 4.35974  7.66459  5.03850   93
5   chr1  917590  918117   split_on_flag_rep2_peak_27  83  . 5.57745 11.79668  8.33518  433
6   chr1  922128  922420   split_on_flag_rep2_peak_32  38  . 4.53120  6.12447  3.80320  204
7   chr1  938767  939240   split_on_flag_rep2_peak_44  58  . 5.63120  8.68049  5.85552  398
8   chr1  940569  941178   split_on_flag_rep2_peak_47  61  . 5.89254  9.09404  6.17741  226
9   chr1  967494  968383   split_on_flag_rep2_peak_51  27  . 3.29425  4.79185  2.77313  785
10  chr1  983793  984073   split_on_flag_rep2_peak_56  58  . 5.63120  8.68049  5.85552   55
11  chr1 1064104 1064725   split_on_flag_rep2_peak_72  42  . 3.73134  6.60524  4.21172   54
12  chr1 1068405 1068758   split_on_flag_rep2_peak_74  49  . 5.28030  7.55213  4.94117   74
13  chr1 1069660 1070580   split_on_flag_rep2_peak_76 102  . 7.23364 14.21410 10.25889  367
14  chr1 1117487 1117897   split_on_flag_rep2_peak_86  61  . 6.16481  9.09404  6.17741  236
15  chr1 1121720 1122220   split_on_flag_rep2_peak_90 123  . 9.01154 16.81533 12.32594  397
16  chr1 1130274 1130554  split_on_flag_rep2_peak_100 112  . 8.64179 15.45685 11.23009  178
17  chr1 1136461 1136703  split_on_flag_rep2_peak_108  35  . 4.59043  5.81492  3.59353  151
18  chr1 1140759 1140964  split_on_flag_rep2_peak_110  45  . 5.07574  7.07820  4.53636   82
19  chr1 1148471 1148827  split_on_flag_rep2_peak_118  69  . 5.12081 10.08017  6.98827   59
20  chr1 1156950 1157148  split_on_flag_rep2_peak_126  53  . 5.62027  8.06901  5.36720   90

 [ reached 'max' / getOption("max.print") -- omitted 19800 rows ]
> gr1 <- toGRanges(bed, format="BED", header=FALSE)
> gr1
GRanges object with 19900 ranges and 9 metadata columns:
        seqnames              ranges strand |                     V4        V5          V6        V7        V8        V9
           <Rle>           <IRanges>  <Rle> |            <character> <integer> <character> <numeric> <numeric> <numeric>
      1     chr1       903734-904216      * | split_on_flag_rep2_p..        42           .   4.59529   6.68330   4.24517
      2     chr1       912653-912938      * | split_on_flag_rep2_p..        27           .   3.29425   4.79185   2.77313
      3     chr1       913579-913927      * | split_on_flag_rep2_p..        38           .   4.80347   6.12447   3.80320
      4     chr1       916913-917442      * | split_on_flag_rep2_p..        50           .   4.35974   7.66459   5.03850
      5     chr1       917590-918117      * | split_on_flag_rep2_p..        83           .   5.57745  11.79668   8.33518
    ...      ...                 ...    ... .                    ...       ...         ...       ...       ...       ...
  19896     chrX 154603229-154603822      * | split_on_flag_rep2_p..        39           .   4.19116   6.34820   3.98126
  19897     chrX 154658962-154659266      * | split_on_flag_rep2_p..        66           .   5.95576   9.62623   6.60090
  19898     chrX 154670979-154671373      * | split_on_flag_rep2_p..        62           .   5.47176   9.20763   6.27269
  19899     chrX 154750498-154750801      * | split_on_flag_rep2_p..       136           .   8.45135  18.49883  13.60219
  19900     chrX 154939097-154939231      * | split_on_flag_rep2_p..        29           .   3.98644   5.08049   2.98889
              V10      format    header
        <integer> <character> <logical>
      1        64         BED     FALSE
      2       189         BED     FALSE
      3       244         BED     FALSE
      4        93         BED     FALSE
      5       433         BED     FALSE
    ...       ...         ...       ...
  19896       488         BED     FALSE
  19897       232         BED     FALSE
  19898       131         BED     FALSE
  19899       118         BED     FALSE
  19900        38         BED     FALSE
  -------
  seqinfo: 46 sequences from an unspecified genome; no seqlengths

Please let me know if this looks fine.

Thanks!

ADD REPLY • link 3.9 years ago Bakhtiyar • 0

1

Entering edit mode

According to the documentation, toGRanges() is able to read in filename directly, so you should NOT need to manually convert bed into df. I informed the function developer on GitHub (https://github.com/jianhong/ChIPpeakAnno/issues/17#issue-1124441134).

BTW, the following is misleading, it basically tries to read the bed file from the extdata folder that comes with ChIPpeakAnno package and you are not supposed to put your custom data into that folder.

bed <- system.file("extdata", "overlap_flag_coordinates_with_r_duplicates_removed.bed", package="ChIPpeakAnno")

ADD REPLY • link 3.9 years ago Kai Hu ▴ 70

0

Entering edit mode

Please add one more line before gr1 <- toGRanges(bed, format="BED", header=FALSE)

bed[, 2] <- bed[, 2] +1

ADD REPLY • link 3.9 years ago Ou, Jianhong ★ 1.3k