Question: DiffBind: reading in peaksets generated by Homer2
0
gravatar for iliya.lefterov
4.5 years ago by
United States
iliya.lefterov0 wrote:

Hi Rory,

Thanks for advising me to post the question to the main Bioconductor  forum.

When trying to read in the peaksets with a command like this:

factorname =dba(sampleSheet="filename.csv")

I get this error message:

Error in Summary.factor(c(59L, 15L, 56L, 12L, 21L, 23L, 43L, 52L, 28L,  :
  ‘max’ not meaningful for factors

I don’t have a PeakFormat column in the filename .csv file, and I am using BED files generated by Homer2. The BED files are plain tab delimited txt files with 4 columns; the 4th column  has the peak scores in a format chr#-value.

I found that the deletion of the 4th column solves the problem and the peaksets are read in.

The question is if the peak scores are considered by DiffBind at all, and if the problem that arises later on (no way to get beyond factorname = dba.analyze(factorname) is a result of the reading in the peaksets?

Thanks in advance,

Iliya Lefterov   

diffbind • 2.1k views
ADD COMMENTlink modified 4.2 years ago by sergio.espeso-gil0 • written 4.5 years ago by iliya.lefterov0
Answer: DiffBind: reading in peaksets generated by Homer2
1
gravatar for Gord Brown
4.5 years ago by
Gord Brown590
United Kingdom
Gord Brown590 wrote:

Hi,

Assuming you're using "pos2bed.pl" to generate the bed file, you should be aware that the 4th column "chrN-NN" is not the score.  It's just the name of the peak (the NN'th peak on chromosome chrN).  The 5th column is the score, but for some reason they're all set to 1.  We can add support for HOMER format, but in the meantime you could use something like:

awk '{print $2,$3,$4,$1,$8,$5}' peaks.txt > peaks.bed

to generate a bed file with scores. (You'll need to remove the comment lines at the start of the file.)

Cheers,

 - Gord

 

 

ADD COMMENTlink written 4.5 years ago by Gord Brown590

The reason for the fifth column are all set to 1 is that I think there is a bug in the Perl script of pos2bed.pl? Not matter you use "-5" option or not (-5 : Set 5th column to the value 1 instead of value in 6th column of pos file), the fifth column will be 1 always.

ADD REPLYlink modified 16 months ago • written 16 months ago by xiaofeiwang1982660

Hi Gord, I think it should be $3-1 for the start position. 

My understanding is that BED format is 0-based for the second column (chromStart), while it is 1-based for the peak txt file from HOMER. I double checked the script of "pos2bed.pl", which is "my $start = $line[2]-1;" to get start position. If I am not right, please correct me because I used $3-1 currently. Thanks a lot! 

Also, could I ask a question about HOMER? Do you know how how HOMER calculates the score for a given peak? To be honest, I don't know the meaning of the "peak score". Here what I got from the HOMER manual as below. Actually, I used 6th column added to transfered BED file. What the difference between them to Diffbind? Thank you so... much!

  • Column 6: Normalized Tag Counts - number of tags found at the peak, normalized to 10 million total mapped tags (or defined by the user)
  • Column 8: Peak score (position adjusted reads from initial peak region - reads per position may be limited)

 

ADD REPLYlink written 16 months ago by xiaofeiwang1982660
Answer: DiffBind: reading in peaksets generated by Homer2
0
gravatar for Rory Stark
4.5 years ago by
Rory Stark2.8k
CRUK, Cambridge, UK
Rory Stark2.8k wrote:

Hello Ilya-

As you discovered, the score column, if present, needs to be interpretable as a numeric value. In the default case, DiffBind looks in the fourth columns for a score.

If you remove the score column, then all the scores are set to 1.  This results in binary score vectors for each sample, with a score of 1 for every peak that was called for that sample, and a score of -1 for peaks that were called in other samples but not that one. The scores are only used  for clustering (correlation heatmaps and PCA plots) prior to the call to dba.count(). Scores based on actual read counts in every interval for every sample are used after that, particularly for dba.analyze(). So you lose very little by not having the more granular scores from the peak caller. 

I'll add "support Homer2 format natively" to the list of possible features to add -- I can just strip off the "chr#-"part.

Cheers-

Rory

 

 

ADD COMMENTlink written 4.5 years ago by Rory Stark2.8k

Hi Rory and Gord

Thank you so much... Working now with the peak.txt files.

Cheers,

Iliya

ADD REPLYlink written 4.4 years ago by iliya.lefterov0
Answer: DiffBind: reading in peaksets generated by Homer2
0
gravatar for sergio.espeso-gil
4.2 years ago by
New York
sergio.espeso-gil0 wrote:

Hi all,

Another possibility if you don't want to play much with the "peaks.txt" file is to go to "pos2bed.pl" and change $line[0] by $line[7] in the command line (almost at the end of the script):

 print $filePtr "$chr\t$start\t$end\t$line[0]\t$v\t$dir\n";

by

 print $filePtr "$chr\t$start\t$end\t$line[7]\t$v\t$dir\n"; 

That will put the findPeaks score in the forth column. Hope this can help! 

 

 

ADD COMMENTlink written 4.2 years ago by sergio.espeso-gil0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 163 users visited in the last hour