RefNet: source/data preparation used for Gerstein et al (2012) human TF data?
2
0
Entering edit mode
Keith Hughitt ▴ 170
@keith-hughitt-6740
Last seen 8 months ago
United States

Hello,

Does anyone happen to know what source / processing was used to construct the "gerstein-2012" annotations in the RefNet package?

It appears that the network includes 6896 edges:

    library('RefNet')
refnet = RefNet()
ixns = interactions(refnet, species="9606", provider=c("gerstein-2012"))
nrow(ixns) # 6895

I would have suspected that the data used to construct the annotations would have come from http://encodenets.gersteinlab.org/, but none of the correpsonding files there are of a similar size:

wc -l *

26070 enets2.Proximal_filtered.txt
19258 enets3.Distal.txt

Is there another source that I've missed? Or was some additional processing down on the dataset resulting in a subset of the original edges?

Keith

refnet annotationhub • 958 views
0
Entering edit mode
pshannon ▴ 90
@pshannon-6931
Last seen 6.9 years ago
United States

Hi Keith,

The RefNet gerstein-2012 data interactions are from

http://archive.gersteinlab.org/proj/Hierarchy_Rewiring/PNAS_hier/Hs_Tr.txt

If other (and possibly more recent) interaction data sets are of compelling interest, let us know.  The 4 which are built in to RefNet now ('native'; they live in the AnnotationHub) rather than from PSICQUIC were chosen for their contrasting nature and origin -- not for their comprehensiveness.

R> show(refnet)
RefNet object with 25 providers in 2 classes
| provider class 'native':
|     gerstein-2012
|     hypoxiaSignaling-2006
|     stamlabTFs-2012
|     recon202
| provider class 'PSICQUIC':
|     BioGrid
|     bhf-ucl
| ...

- Paul
0
Entering edit mode

Hi Paul,

Thanks for the response and clarification. I think that the dataset you are using is actually from an earlier paper out of the Gerstein lab -- "Rewiring of Transcriptional Regulatory Networks: Hierarchy, Rather Than Connectivity, Better Reflects the Importance of Regulators" (2010).

The datasets associated with the ENCODE paper are on http://encodenets.gersteinlab.org/.

I could see both being useful for some people, so it probably wouldn't hurt to include both the datasets. It might also be worth it to include the source of each of the datasets in the documentation. To be even more explicit, you could even include the scripts you used to generate each of the external RData-based providers.

Thanks for your work putting this useful package together!

All the best,

Keith

0
Entering edit mode
pshannon ▴ 90
@pshannon-6931
Last seen 6.9 years ago
United States

Thanks, Keith: good catch.  I'll update the pmid in RefNet.

- Paul