another problem when converting bed to granges using togranges
1
0
Entering edit mode
Bakhtiyar • 0
@6f97db95
Last seen 2.8 years ago
Canada

So I have already performed this action before with the same bed file with no problem and now when I simply copy the same commands another error appears. I am not sure what the error is talking about, any ideas? Thanks.


> bed <- as.data.frame(read.table("overlap_flag_coordinates_with_r_duplicates_removed.bed",header = FALSE, sep="\t",stringsAsFactors=FALSE, quote=""))
> print(bed)
      V1      V2      V3                           V4  V5 V6      V7       V8       V9  V10
1   chr1  903734  904216   split_on_flag_rep2_peak_15  42  . 4.59529  6.68330  4.24517   64
2   chr1  912653  912938   split_on_flag_rep2_peak_22  27  . 3.29425  4.79185  2.77313  189
3   chr1  913579  913927   split_on_flag_rep2_peak_23  38  . 4.80347  6.12447  3.80320  244
4   chr1  916913  917442   split_on_flag_rep2_peak_26  50  . 4.35974  7.66459  5.03850   93
5   chr1  917590  918117   split_on_flag_rep2_peak_27  83  . 5.57745 11.79668  8.33518  433
6   chr1  922128  922420   split_on_flag_rep2_peak_32  38  . 4.53120  6.12447  3.80320  204
7   chr1  938767  939240   split_on_flag_rep2_peak_44  58  . 5.63120  8.68049  5.85552  398
8   chr1  940569  941178   split_on_flag_rep2_peak_47  61  . 5.89254  9.09404  6.17741  226
9   chr1  967494  968383   split_on_flag_rep2_peak_51  27  . 3.29425  4.79185  2.77313  785
10  chr1  983793  984073   split_on_flag_rep2_peak_56  58  . 5.63120  8.68049  5.85552   55
11  chr1 1064104 1064725   split_on_flag_rep2_peak_72  42  . 3.73134  6.60524  4.21172   54
12  chr1 1068405 1068758   split_on_flag_rep2_peak_74  49  . 5.28030  7.55213  4.94117   74
13  chr1 1069660 1070580   split_on_flag_rep2_peak_76 102  . 7.23364 14.21410 10.25889  367
14  chr1 1117487 1117897   split_on_flag_rep2_peak_86  61  . 6.16481  9.09404  6.17741  236
15  chr1 1121720 1122220   split_on_flag_rep2_peak_90 123  . 9.01154 16.81533 12.32594  397
16  chr1 1130274 1130554  split_on_flag_rep2_peak_100 112  . 8.64179 15.45685 11.23009  178
17  chr1 1136461 1136703  split_on_flag_rep2_peak_108  35  . 4.59043  5.81492  3.59353  151
18  chr1 1140759 1140964  split_on_flag_rep2_peak_110  45  . 5.07574  7.07820  4.53636   82
19  chr1 1148471 1148827  split_on_flag_rep2_peak_118  69  . 5.12081 10.08017  6.98827   59
20  chr1 1156950 1157148  split_on_flag_rep2_peak_126  53  . 5.62027  8.06901  5.36720   90
21  chr1 1210192 1210606  split_on_flag_rep2_peak_167  62  . 5.69381  9.20763  6.27269  282
22  chr1 1291617 1291923  split_on_flag_rep2_peak_176  53  . 5.62027  8.06901  5.36720  235
23  chr1 1304443 1304781  split_on_flag_rep2_peak_177 144  . 9.97656 19.53761 14.42307  226
24  chr1 1348122 1348313  split_on_flag_rep2_peak_187  37  . 4.58563  5.96864  3.70799  151
25  chr1 1430370 1430949  split_on_flag_rep2_peak_207  94  . 7.62726 13.21828  9.47143  470
26  chr1 1441128 1442007  split_on_flag_rep2_peak_217  56  . 5.20414  8.41215  5.63397  265
27  chr1 1444032 1444313  split_on_flag_rep2_peak_221  38  . 4.53120  6.12447  3.80320   93
28  chr1 1455282 1455981  split_on_flag_rep2_peak_228  42  . 4.59529  6.68330  4.24517  606
29  chr1 1459234 1459499  split_on_flag_rep2_peak_231  30  . 4.25893  5.21123  3.08267   48
30  chr1 1578655 1578898  split_on_flag_rep2_peak_246  52  . 5.52933  7.93085  5.26537  133
31  chr1 1600442 1601124  split_on_flag_rep2_peak_263 110  . 8.49332 15.22205 11.06264  281
32  chr1 1907388 1907713  split_on_flag_rep2_peak_295  38  . 4.53120  6.12447  3.80320   30
33  chr1 1919784 1920559  split_on_flag_rep2_peak_302  78  . 6.01594 11.20838  7.84588   66
34  chr1 1937596 1937877  split_on_flag_rep2_peak_318  57  . 4.94853  8.52716  5.72150   49
35  chr1 1947476 1947772  split_on_flag_rep2_peak_328  44  . 4.05532  6.90630  4.42478  215
36  chr1 1951867 1952046  split_on_flag_rep2_peak_330  53  . 5.24104  8.06021  5.36720   88
37  chr1 1964284 1964419  split_on_flag_rep2_peak_342  31  . 4.27238  5.35210  3.19946   86
38  chr1 1965506 1966608  split_on_flag_rep2_peak_344  65  . 6.08193  9.59291  6.56973  982
39  chr1 1991183 1991354  split_on_flag_rep2_peak_362  17  . 2.83762  3.53412  1.78672   82
40  chr1 2015482 2015822  split_on_flag_rep2_peak_379  41  . 4.58358  6.48253  4.10451   63
41  chr1 2020501 2020839  split_on_flag_rep2_peak_387  42  . 4.58002  6.65913  4.24517  242
42  chr1 2045114 2045369  split_on_flag_rep2_peak_417  53  . 4.09665  7.99138  5.31981  125
43  chr1 2049192 2050077  split_on_flag_rep2_peak_421 147  . 9.99723 19.95232 14.76748  597
44  chr1 2209463 2210427  split_on_flag_rep2_peak_449  66  . 5.98636  9.67487  6.63819   74
45  chr1 2335696 2336115  split_on_flag_rep2_peak_482  35  . 4.59043  5.81492  3.59353  335
46  chr1 2342957 2343856  split_on_flag_rep2_peak_488  42  . 4.59529  6.68330  4.24517  825
47  chr1 2435344 2435598  split_on_flag_rep2_peak_517  32  . 4.34180  5.45328  3.28697  131
48  chr1 2437876 2438479  split_on_flag_rep2_peak_519  61  . 6.16481  9.09404  6.17741  528
49  chr1 2803485 2804203  split_on_flag_rep2_peak_537  59  . 5.21682  8.77902  5.93386  251
50  chr1 2810331 2810656  split_on_flag_rep2_peak_545  53  . 5.62027  8.06901  5.36720   80
51  chr1 2814511 2815043  split_on_flag_rep2_peak_551  93  . 7.53890 13.07700  9.37274   96
52  chr1 2815551 2815946  split_on_flag_rep2_peak_553  84  . 7.23378 11.89780  8.41582  296
53  chr1 2840559 2840783  split_on_flag_rep2_peak_575  37  . 4.69805  5.97127  3.70898  148
54  chr1 2860563 2861058  split_on_flag_rep2_peak_586 105  . 8.34295 14.65639 10.57929  382
55  chr1 2868563 2868845  split_on_flag_rep2_peak_596  70  . 6.03183 10.13394  7.03485  212
56  chr1 2887599 2888280  split_on_flag_rep2_peak_614  91  . 7.41012 12.87080  9.18740  255
57  chr1 2933899 2934034  split_on_flag_rep2_peak_658  39  . 4.58790  6.33154  3.97100   64
58  chr1 2942642 2943313  split_on_flag_rep2_peak_662  42  . 4.39234  6.68330  4.24517  498
59  chr1 2947108 2947275  split_on_flag_rep2_peak_666  42  . 4.59529  6.68330  4.24517  102
60  chr1 2950936 2951195  split_on_flag_rep2_peak_671  78  . 6.70934 11.23720  7.84588  100
61  chr1 2969892 2970328  split_on_flag_rep2_peak_681  45  . 5.07574  7.07820  4.53636  318
62  chr1 2980213 2980765  split_on_flag_rep2_peak_693  96  . 7.79842 13.49155  9.68812  129
63  chr1 2983442 2983609  split_on_flag_rep2_peak_696  45  . 5.07574  7.07820  4.53636   97
64  chr1 2988290 2988600  split_on_flag_rep2_peak_700  52  . 5.52933  7.93085  5.26537  192
65  chr1 3002861 3005047  split_on_flag_rep2_peak_706 104  . 7.78720 14.49067 10.49436 1955
66  chr1 3021472 3021628  split_on_flag_rep2_peak_716  65  . 5.08963  9.54463  6.52471   85
67  chr1 3026682 3026842  split_on_flag_rep2_peak_720  42  . 4.59529  6.68330  4.24517   50
68  chr1 3033561 3033699  split_on_flag_rep2_peak_728  47  . 4.80562  7.36005  4.78056   78
69  chr1 3349134 3349821  split_on_flag_rep2_peak_768  98  . 6.70868 13.63944  9.80151  580
70  chr1 3402496 3403030  split_on_flag_rep2_peak_782  83  . 5.57745 11.79668  8.33518  424
71  chr1 3423905 3424175  split_on_flag_rep2_peak_792  29  . 3.98644  5.08049  2.98889  203
72  chr1 3424303 3424785  split_on_flag_rep2_peak_793  42  . 5.11082  6.73404  4.28459  402
73  chr1 3442262 3442759  split_on_flag_rep2_peak_797  44  . 4.20753  6.90630  4.42478  215
74  chr1 3452088 3453048  split_on_flag_rep2_peak_803 110  . 8.23313 15.22205 11.06264  884
75  chr1 3459086 3459343  split_on_flag_rep2_peak_808  29  . 3.98644  5.08049  2.98889  171
76  chr1 3484119 3484433  split_on_flag_rep2_peak_824  54  . 5.32620  8.19604  5.47686  254
77  chr1 3535806 3536141  split_on_flag_rep2_peak_844  23  . 3.98667  4.34251  2.38194   65
78  chr1 3565488 3565973  split_on_flag_rep2_peak_862  42  . 4.59529  6.68330  4.24517   53
79  chr1 3652091 3652233  split_on_flag_rep2_peak_883  51  . 5.28599  7.75839  5.10916   61
80  chr1 3672004 3672154  split_on_flag_rep2_peak_891  42  . 4.59529  6.68330  4.24517   36
81  chr1 3702654 3703353  split_on_flag_rep2_peak_903  56  . 5.00119  8.41215  5.63397  393
82  chr1 3717594 3718024  split_on_flag_rep2_peak_908  45  . 5.34801  7.07820  4.53636  307
83  chr1 3746410 3746893  split_on_flag_rep2_peak_912  45  . 4.79342  6.99517  4.50186  382
84  chr1 3752410 3752776  split_on_flag_rep2_peak_914  53  . 4.77213  8.01736  5.33764   76
85  chr1 3755456 3755699  split_on_flag_rep2_peak_918  23  . 3.32663  4.34703  2.38354  197
86  chr1 3799892 3800224  split_on_flag_rep2_peak_929  84  . 7.01248 11.96289  8.47530  102
87  chr1 3906088 3906367  split_on_flag_rep2_peak_943  49  . 4.78230  7.50674  4.91519   87
88  chr1 3907607 3908194  split_on_flag_rep2_peak_945  77  . 6.82839 10.99833  7.73167  476
89  chr1 3910048 3910293  split_on_flag_rep2_peak_947  57  . 5.13427  8.52716  5.72150   79
90  chr1 3928694 3928878  split_on_flag_rep2_peak_956  74  . 6.45327 10.67549  7.46693  100
91  chr1 3939829 3940330  split_on_flag_rep2_peak_961  42  . 4.58002  6.65913  4.24517  184
92  chr1 3943899 3944176  split_on_flag_rep2_peak_965  94  . 6.80506 13.15166  9.43232   89
93  chr1 3957979 3958476  split_on_flag_rep2_peak_973  49  . 4.79824  7.53305  4.92334   99
94  chr1 3981391 3981675  split_on_flag_rep2_peak_984  45  . 4.94615  7.04170  4.53636  198
95  chr1 3982956 3983116  split_on_flag_rep2_peak_986  78  . 6.21889 11.20838  7.84588   82
96  chr1 4000019 4000219  split_on_flag_rep2_peak_994  47  . 3.85311  7.28745  4.72162  112
97  chr1 4002877 4003015  split_on_flag_rep2_peak_997  41  . 4.94105  6.48738  4.10793   57
98  chr1 4004015 4004383  split_on_flag_rep2_peak_999  77  . 6.69403 11.05501  7.78133   86
99  chr1 4004654 4005310 split_on_flag_rep2_peak_1000  38  . 3.90311  6.17367  3.83231   63
100 chr1 4010883 4011262 split_on_flag_rep2_peak_1004  49  . 4.79824  7.53305  4.92334  270
 [ reached 'max' / getOption("max.print") -- omitted 19800 rows ]
> gr1 <- toGRanges(bed, format="BED", header=FALSE)
Error in (function (data, colNames = NULL, format = "", ...)  : 
  colname must contain space/seqnames, start and end.

sessionInfo( )
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252   
[3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C                   
[5] LC_TIME=English_Canada.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ChIPpeakAnno_3.26.4  GenomicRanges_1.44.0 GenomeInfoDb_1.28.4  IRanges_2.26.0      
[5] S4Vectors_0.30.2     BiocGenerics_0.38.0 

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.24.0         bitops_1.0-7                matrixStats_0.61.0         
 [4] bit64_4.0.5                 filelock_1.0.2              progress_1.2.2             
 [7] httr_1.4.2                  InteractionSet_1.20.0       tools_4.1.1                
[10] utf8_1.2.2                  R6_2.5.1                    colorspace_2.0-2           
[13] DBI_1.1.2                   lazyeval_0.2.2              tidyselect_1.1.1           
[16] prettyunits_1.1.1           bit_4.0.4                   curl_4.3.2                 
[19] compiler_4.1.1              VennDiagram_1.7.1           graph_1.70.0               
[22] cli_3.1.1                   Biobase_2.52.0              formatR_1.11               
[25] xml2_1.3.3                  DelayedArray_0.18.0         rtracklayer_1.52.1         
[28] scales_1.1.1                RBGL_1.68.0                 rappdirs_0.3.3             
[31] stringr_1.4.0               digest_0.6.29               Rsamtools_2.8.0            
[34] XVector_0.32.0              pkgconfig_2.0.3             MatrixGenerics_1.4.3       
[37] BSgenome_1.60.0             regioneR_1.24.0             dbplyr_2.1.1               
[40] fastmap_1.1.0               ensembldb_2.16.4            rlang_1.0.0                
[43] rstudioapi_0.13             RSQLite_2.2.9               BiocIO_1.2.0               
[46] generics_0.1.2              BiocParallel_1.26.2         dplyr_1.0.7                
[49] RCurl_1.98-1.5              magrittr_2.0.2              GenomeInfoDbData_1.2.6     
[52] futile.logger_1.4.3         Matrix_1.3-4                munsell_0.5.0              
[55] Rcpp_1.0.8                  fansi_1.0.2                 lifecycle_1.0.1            
[58] stringi_1.7.6               yaml_2.2.2                  MASS_7.3-54                
[61] SummarizedExperiment_1.22.0 zlibbioc_1.38.0             BiocFileCache_2.0.0        
[64] grid_4.1.1                  blob_1.2.2                  crayon_1.4.2               
[67] lattice_0.20-44             splines_4.1.1               Biostrings_2.60.2          
[70] multtest_2.48.0             GenomicFeatures_1.44.2      hms_1.1.1                  
[73] KEGGREST_1.32.0             pillar_1.7.0                rjson_0.2.21               
[76] biomaRt_2.48.3              futile.options_1.0.1        XML_3.99-0.8               
[79] glue_1.6.1                  lambda.r_1.2.4              png_0.1-7                  
[82] vctrs_0.3.8                 gtable_0.3.0                purrr_0.3.4                
[85] assertthat_0.2.1            cachem_1.0.6                ggplot2_3.3.5              
[88] restfulr_0.0.13             AnnotationFilter_1.16.0     survival_3.2-11            
[91] tibble_3.1.6                GenomicAlignments_1.28.0    AnnotationDbi_1.54.1       
[94] memoise_2.0.1               ellipsis_0.3.2
ChIPpeakAnno • 2.0k views
ADD COMMENT
1
Entering edit mode
Kai Hu ▴ 70
@kai
Last seen 7 months ago
Worcester

I believe the ChIPpeakAnno developer will fix this issue soon. Basically, there are some glitch in toGRanges() when format = "BED".

A possible quick workaround is to set it to narrowPeak:

gr1 <- toGRanges("overlap_flag_coordinates_with_r_duplicates_removed.bed", format="narrowPeak", header=FALSE)

Again, you don't need to convert your .bed into df since toGRanges() can read in files directly.

Another suggestion is to post follow-up questions under your original post so that others can understand the context better: Problem converting a bed file to granges using togranges chippeakanno

ADD COMMENT
0
Entering edit mode

Thanks for your help again. Setting up the format to narrowPeak worked, but only when I read the bed file via toGRanges() directly. Your suggestion about follow-up questions is also noted.

> gr1 <- toGRanges(bed, format="narrowPeak", header=FALSE)
Error in (function (data, colNames = NULL, format = "", ...)  : 
  colname must contain space/seqnames, start and end.
> gr1 <- toGRanges("overlap_flag_coordinates_with_r_duplicates_removed.bed", format="narrowPeak", header=FALSE)
Warning message:
In formatStrand(strand) : All the characters for strand, 
            other than '1', '-1', '+', '-' and '*', 
            will be converted into '*'.
> print(gr1)
GRanges object with 19900 ranges and 5 metadata columns:
                                 seqnames              ranges strand |     score signalValue
                                    <Rle>           <IRanges>  <Rle> | <integer>   <numeric>
      split_on_flag_rep2_peak_15     chr1       903734-904216      * |        42     4.59529
      split_on_flag_rep2_peak_22     chr1       912653-912938      * |        27     3.29425
      split_on_flag_rep2_peak_23     chr1       913579-913927      * |        38     4.80347
      split_on_flag_rep2_peak_26     chr1       916913-917442      * |        50     4.35974
      split_on_flag_rep2_peak_27     chr1       917590-918117      * |        83     5.57745
                             ...      ...                 ...    ... .       ...         ...
  split_on_flag_rep2_peak_240995     chrX 154603229-154603822      * |        39     4.19116
  split_on_flag_rep2_peak_241002     chrX 154658962-154659266      * |        66     5.95576
  split_on_flag_rep2_peak_241005     chrX 154670979-154671373      * |        62     5.47176
  split_on_flag_rep2_peak_241011     chrX 154750498-154750801      * |       136     8.45135
  split_on_flag_rep2_peak_241023     chrX 154939097-154939231      * |        29     3.98644
                                    pValue    qValue      peak
                                 <numeric> <numeric> <integer>
      split_on_flag_rep2_peak_15   6.68330   4.24517        64
      split_on_flag_rep2_peak_22   4.79185   2.77313       189
      split_on_flag_rep2_peak_23   6.12447   3.80320       244
      split_on_flag_rep2_peak_26   7.66459   5.03850        93
      split_on_flag_rep2_peak_27  11.79668   8.33518       433
                             ...       ...       ...       ...
  split_on_flag_rep2_peak_240995   6.34820   3.98126       488
  split_on_flag_rep2_peak_241002   9.62623   6.60090       232
  split_on_flag_rep2_peak_241005   9.20763   6.27269       131
  split_on_flag_rep2_peak_241011  18.49883  13.60219       118
  split_on_flag_rep2_peak_241023   5.08049   2.98889        38
  -------
  seqinfo: 46 sequences from an unspecified genome; no seqlengths
ADD REPLY

Login before adding your answer.

Traffic: 668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6