Hi,
I used data set from Encode consortium for my package development, due to size of actual peak files are rather big, I can't use these data set for my package use. Because actual size of Bioconductor package resulted from R CMD build must be less than 4Mb on disk, I have to use rather small peak file as an example data for my package . In Encode sample's data set, each peak files contains around 100,000 peaks each. How can I obtain small example data of Chip-seq peak files ? Is there any approach getting ideal example data from actual peak files for package use? Thanks in advance :)
Here is quick insight of my actual peak files in my packages :
myPkg - inst - extdata - wgEncodeOpenChromChipK562CmycAlnRep1.bed - wgEncodeOpenChromChipK562CmycAlnRep2.bed - wgEncodeOpenChromChipK562CmycAlnRep3.bed - wgEncodeSydhTfbsK562CmycIfna6hStdAlnRep1.bed - wgEncodeSydhTfbsK562CmycIfna6hStdAlnRep2.bed - wgEncodeSydhTfbsK562CmycIfna30StdAlnRep1.bed - wgEncodeSydhTfbsK562CmycIfna30StdAlnRep2.bed - wgEncodeSydhTfbsK562CmycIfng6hStdAlnRep1.bed - wgEncodeSydhTfbsK562CmycIfng6hStdAlnRep2.bed - wgEncodeSydhTfbsK562CmycIggrabAlnRep1.bed - wgEncodeSydhTfbsK562CmycIggrabAlnRep2.bed - wgEncodeSydhTfbsK562CmycStdAlnRep1.bed - wgEncodeSydhTfbsK562CmycStdAlnRep2.bed - R
Edit :
I am looking at Chip-seq for transcription factors. how can I get small peak files such as each bed file has around 1000 peaks ?
Should I take sample from each chromosome in each peak file ? How to get sample ? Any approach to get ideal, small example data from Chip-seq peak files? Thanks a lot
Best regards :
Jurat
Hi Federico :
Thanks for this post from Biostar. Not familiar with using BEDOPS tools on windows. How can I facilitate the process of taking samples from each bed files ? I think it will take some time to get familiar with BEDOPS tools and try the solution. I need something fast, general approach instead. Any idea ?
Best regards :
Jurat