Hi,
I used data set from Encode consortium for my package development, due to size of actual peak files are rather big, I can't use these data set for my package use. Because actual size of Bioconductor package resulted from R CMD build must be less than 4Mb on disk, I have to use rather small peak file as an example data for my package . In Encode sample's data set, each peak files contains around 100,000 peaks each. How can I obtain small example data of Chip-seq peak files ? Is there any approach getting ideal example data from actual peak files for package use? Thanks in advance :)
Here is quick insight of my actual peak files in my packages :
myPkg
- inst
     - extdata
             - wgEncodeOpenChromChipK562CmycAlnRep1.bed
             - wgEncodeOpenChromChipK562CmycAlnRep2.bed
             - wgEncodeOpenChromChipK562CmycAlnRep3.bed
             - wgEncodeSydhTfbsK562CmycIfna6hStdAlnRep1.bed
             - wgEncodeSydhTfbsK562CmycIfna6hStdAlnRep2.bed
             - wgEncodeSydhTfbsK562CmycIfna30StdAlnRep1.bed
             - wgEncodeSydhTfbsK562CmycIfna30StdAlnRep2.bed
             - wgEncodeSydhTfbsK562CmycIfng6hStdAlnRep1.bed
             - wgEncodeSydhTfbsK562CmycIfng6hStdAlnRep2.bed
             - wgEncodeSydhTfbsK562CmycIggrabAlnRep1.bed
             - wgEncodeSydhTfbsK562CmycIggrabAlnRep2.bed
             - wgEncodeSydhTfbsK562CmycStdAlnRep1.bed
             - wgEncodeSydhTfbsK562CmycStdAlnRep2.bed
- R
Edit :
I am looking at Chip-seq for transcription factors. how can I get small peak files such as each bed file has around 1000 peaks ?
Should I take sample from each chromosome in each peak file ? How to get sample ? Any approach to get ideal, small example data from Chip-seq peak files? Thanks a lot
Best regards :
Jurat

Hi Federico :
Thanks for this post from Biostar. Not familiar with using BEDOPS tools on windows. How can I facilitate the process of taking samples from each bed files ? I think it will take some time to get familiar with BEDOPS tools and try the solution. I need something fast, general approach instead. Any idea ?
Best regards :
Jurat