Hi everyone, I have actually a blast file format such as:
qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore strand
scaffold_1 YP_009352567.1 27.0 278 163 9 20 50 222 487 1.000000e-16 94.0 +
scaffold_1 YP_009352567.1 35.5 166 93 2 130 300 63 227 2.000000e-15 89.7 +
scaffold_1 NP_056889.1 42.5 106 56 4 906 1223 910 1010 2.600000e-12 79.3 -
scaffold_1 NP_955606.1 42.5 106 56 4 906 1223 601 701 2.600000e-12 79.3 -
scaffold_3 NP_077672.1 44.4 99 51 1 14000 15000 219 317 4.400000e-12 78.6 -
scaffold_3 NP_077672.1 32.2 115 70 1 16000 20300 119 225 1.200000e-09 70.5 -
scaffold_4 YP_009173615.1 29.9 174 84 4 800 1000 209 344 4.500000e-15 88.2 -
scaffold_8 YP_009482546.1 29.1 148 98 4 42822 43265 1531 1671 6.300000e-13 80.5 -
scaffold_8 YP_009481279.1 31.1 148 94 4 3 500 1488 1627 1.100000e-12 79.7 -
scaffold_8 YP_008319462.1 29.3 140 91 4 2 500 1466 1597 5.900000e-11 73.9 -
scaffold_8 YP_008437707.2 30.0 140 90 4 3 500 1473 1604 5.900000e-11 73.9 -
and I wanted to reduce all the HSP together and get something like:
qseqid qstart qend
scaffold_1 20 300
scaffold_1 906 1223
scaffold_1 906 1223
scaffold_3 14000 20300
scaffold_4 800 1000
scaffold_8 42822 43265
scaffold_8 3 500
scaffold_8 2 500
scaffold_8 3 500
As you can see the coordinates of start and end changed for some scaffold because of the HSP merging.
I know that GRanges can be helpful and I tried in R :
blast<-read.table("dataframe.txt",header=T,sep="")
genes <- GRanges(seqnames=qseqid,ranges=IRanges(start=qstart,end = qend),strand=strand)
gr1 = reduce(genes)
But it does not work..
Thank you for your help.
When I tried your above code I end up with something like the following:
In what way do you feel it doesn't work?
Because the coordinates are wrong, we should get:
I do not understand why there is for the scaffold1 a range from 20 to 50 since there are two matches with the
sseqid YP_009352567.1
One from20 to 50
and one from130 to 300
so those two matches are 2 HSP and should be gathered together to form a new range from20
(the minimum of the first HSP) to300
(the max of the second HSP)...