rtracklayer export bedGraph cannot be loaded into UCSC
1
1
Entering edit mode
@nicolas-servant-1466
Last seen 22 months ago
France

Hi all,

I have a simple question regarding the export function of rtracklayer.

I have a GRanges object, and I would like to export a BedGraph file which can be loaded into the UCSC genome browser.

  myTrackLine <- new("GraphTrackLine", type="bedGraph", name="mysample", description="mysample")
  export(x, con = "out.bedgraph", trackLine=myTrackLine, format="bedGraph")

Now, looking at the bedGraph file ;

track name=mysample description="mysample"
chr1    3073252    3074322    .    0    +
chr1    3102015    3102125    .    0    +
chr1    3252756    3253236    .    0    +

However, as there is no "type=bedGraph", the file is loaded as a 'bed' file, and is not displayed as bar.

Did I miss something ? how can I add this values ?

Best

Nicolas

rtracklayer • 2.0k views
ADD COMMENT
0
Entering edit mode

Looks like it is outputting BED instead of bedGraph. But I can't reproduce it. Maybe paste your sessionInfo()?

ADD REPLY
0
Entering edit mode

sure. Thanks


R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

attached base packages:

[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] DT_0.4                     edgeR_3.20.6              
 [3] limma_3.34.9               ggrepel_0.7.0             
 [5] genefilter_1.60.0          RColorBrewer_1.1-2        
 [7] pheatmap_1.0.10            gridExtra_2.3             
 [9] DESeq2_1.18.1              SummarizedExperiment_1.8.1
[11] DelayedArray_0.4.1         matrixStats_0.53.1        
[13] Biobase_2.38.0             reshape2_1.4.3            
[15] ggplot2_2.2.1              rtracklayer_1.38.2        
[17] GenomicRanges_1.30.1       GenomeInfoDb_1.14.0       
[19] IRanges_2.12.0             S4Vectors_0.16.0          
[21] BiocGenerics_0.24.0      

I don't know if it can help, here ismy GRanges object to export. I double checked that it has a 'score' field for bedGraph export.

GRanges object with 3 ranges and 4 metadata columns:
      seqnames             ranges strand |              gene_id     gene_name
         <Rle>          <IRanges>  <Rle> |          <character>   <character>
  [1]     chr1 [3073253, 3074322]      + | ENSMUSG00000102693.1 4933401J01Rik
  [2]     chr1 [3102016, 3102125]      + | ENSMUSG00000064842.1       Gm26206
  [3]     chr1 [3252757, 3253236]      + | ENSMUSG00000102851.1       Gm18956
                 gene_type     score
               <character> <numeric>
  [1]                  TEC         0
  [2]                snRNA         0
  [3] processed_pseudogene         0
  -------
  seqinfo: 22 sequences from an unspecified genome; no seqlengths

 

 

ADD REPLY
0
Entering edit mode

Please update your R/Bioconductor and see if it helps.

ADD REPLY
0
Entering edit mode

I did, but it does not help.

N

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /bioinfo/local/build/Centos/R/R-3.5.0/lib64/R/lib/libRblas.so
LAPACK: /bioinfo/local/build/Centos/R/R-3.5.0/lib64/R/lib/libRlapack.so

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DT_0.4                      edgeR_3.22.3                limma_3.36.3                ggrepel_0.8.0              
 [5] genefilter_1.62.0           RColorBrewer_1.1-2          pheatmap_1.0.10             gridExtra_2.3              
 [9] DESeq2_1.20.0               SummarizedExperiment_1.10.1 DelayedArray_0.6.5          BiocParallel_1.14.2        
[13] matrixStats_0.54.0          Biobase_2.40.0              reshape2_1.4.3              ggplot2_3.0.0              
[17] rtracklayer_1.40.6          GenomicRanges_1.32.6        GenomeInfoDb_1.16.0         IRanges_2.14.11            
[21] S4Vectors_0.18.3            BiocGenerics_0.26.0        

 

ADD REPLY
0
Entering edit mode

Alright, please send a reproducible example, i.e., include the input data.

ADD REPLY
0
Entering edit mode

I've got it. It is linked to the Granges object.

 

x1<- GRanges(c("chr1","chr2"), IRanges(c(1,10), c(100,1000)), score=c(12,15))
x2 <- GRanges(c("chr1","chr1"), IRanges(c(1,10), c(15,20)), score=c(12,15))

myTrackLine <- new("GraphTrackLine", type="bedGraph", name="test", description="test")


export(x1, format="bedGraph", con="test.bedGraph", trackLine=myTrackLine) ## Works

export(x2, format="bedGraph", con="test.bedGraph", trackLine=myTrackLine) ## Failed

The difference is that the 'x2' has overlapping intervals ...

Thanks !

ADD REPLY
0
Entering edit mode

Do you think you can fix it ? or is there any reason that explain the results ?

Thanks

ADD REPLY
0
Entering edit mode

My overlapping intervals were caused by 0-to-1-based conversion.

For me, my ranges in R are 0-based exclusive, whereas the rtracklayer manual states they should be 1-based, and therefore assumes that it must subtract all chromStart values by 1. Therefore, rtracklayer::export.bedGraph creates several overlapping intervals in situations such as below:

My GRanges object in R:

> FDR_LE_L1_05[c("4848","4849")]
GRanges object with 2 ranges and 1 metadata column:
       seqnames          ranges strand |             score
          <Rle>       <IRanges>  <Rle> |         <numeric>
  4848    chrII 5081136-5081896      * | -11.6373555433484
  4849    chrII 5081896-5082599      * | -5.01917162511854
  -------
  seqinfo: 6 sequences from an unspecified genome; no seqlengths
> rtracklayer::export.bedGraph(FDR_LE_L1_05, "FDR_LE_L1_05.bedGraph")

Notice that the end point of the first range is the same as the start point of the following one: 5081896.

bedGraphToBigWig: After sorting, bedGraphToBigWig halts on this interval:

$ bedSort FDR_LE_L1_05.bedGraph FDR_LE_L1_05.bedGraph
$ bedGraphToBigWig FDR_LE_L1_05.bedGraph chrom.sizes FDR_LE_L1_05.bigWig
Overlapping regions in bedGraph line 3412 of FDR_LE_L1_05.bedGraph

The offending line of the file (+/- one line):

$ sed '3411,3413!d' FDR_LE_L1_05.bedGraph
chrII   5081135 5081896 -11.6373555433484
chrII   5081895 5082599 -5.01917162511854
chrII   5082685 5083978 -25.8326171039818

The second interval now starts before the end of the first one 5081895 < 5081896. After correcting the file manually, the subsequent cases continue to occur, and they appear to be the same problem.

The following operation fixed the problem for me:

> start(FDR_LE_L1_05) = start(FDR_LE_L1_05) + 1

followed by the rest of the operations (export, bedSort, bedGraphToBigWig). There was no error after adjusting the start points.

Idk if the OP's ranges were overlapping in both coordinate spaces, or just contiguous like mine.

ADD REPLY
0
Entering edit mode

Please note that the GRanges (and IRanges) convention is that ranges are 1-based, inclusive. rtracklayer just follows that convention.

ADD REPLY
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States

bedGraph (at least as strictly defined by what UCSC supports) does not support overlapping intervals. The bug is that rtracklayer fails  silently to produce bedGraph and instead produces BED.

For your work, you will need to aggregate the data so that they can be displayed in UCSC correctly.

ADD COMMENT

Login before adding your answer.

Traffic: 813 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6