Where do the filename come from in the LOLA datasets?
1
1
Entering edit mode
@lluis-revilla-sancho
Last seen 5 days ago
European Union

I am using the dataset provided for LOLA to analyze my data (hg38). In the files generated when saving the results with writeCombinedEnrichment I see new columns:

dataSource              filename
ENCODE segmentation Helas3_T.bed

What are these .bed files and where to find them?

In the website there is a link to https://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/ but that file or similar files are not present there. Even after accounting that narrowPeak is a .bed format, there are just three files about Helas3 wit no mention of T: wgEncodeAwgTfbsBroadHelas3CtcfUniPk.narrowPeak.gz, wgEncodeAwgTfbsBroadHelas3Ezh239875UniPk.narrowPeak.gz wgEncodeAwgTfbsBroadHelas3Pol2bUniPk.narrowPeak.gz

LOLA • 484 views
ADD COMMENT
0
Entering edit mode
@nathan-sheffield-7613
Last seen 7 months ago
University of Virginia

The default LOLA database has files from multiple sources. The ones you're referring to are from segmentations, It was many years ago that I assembled that database, but I believe those ones came from the Segmentation tracks from UCSC -- I think this was probably the ChromhmmHelas3, or maybe the CombinedHelas3 segmentation, and T would have been the state called for those regions by that segmentation.

You can find the original files here: https://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeAwgSegmentation

Let me know if this doesn't answer your question, and I can dig a bit deeper.

ADD COMMENT
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6