Hi I have this output from geno():
head(geno(v)$GQ)
1 2 3 4 5 6 7
chr1:10443_C/T 24 3 3 0 0 0 0
chr1:12783_G/A 3 6 7 0 3 0 5
chr1:12839_G/C 5 10 87 21 3 12 57
chr1:12882_C/G 63 24 39 29 18 39 21
chr1:13012_G/A 99 99 99 99 99 69 99
chr1:13079_C/G 99 99 99 99 99 99 98
I tried to convert it in data frame:
t = as.data.frame(geno(v)$GQ)
and it seems ok, but I would like to give to my function one row at time so I used this command:
apply(t[1,], 1, function(x)x)
$`chr1:10443_C/T`
$`chr1:10443_C/T`$1
[1] 24
$`chr1:10443_C/T`$2
[1] 3
$`chr1:10443_C/T`$3
[1] 3
$`chr1:10443_C/T`$4
[1] 0
$`chr1:10443_C/T`$5
[1] 0
$`chr1:10443_C/T`$6
[1] 0
$`chr1:10443_C/T`$7
[1] 0
While I would like to have:
24 3 3 0 0 0 0
that is the first row. I think that the problem is the data frame created using the output object of the function geno(), could you help me?
Thank you.
Riccardo
What are you trying to do to each row of GQ?
I would like to filter out some variants e.g. GQ >=10 in at least 3 samples
This is the row:
How about the complete header of the file?
It is too big I cannot paste here.
How about using pastebin or something?
http://pastie.org/private/
I see, the problem is that the file is using the ancient 4.0 version of the spec, and has no way to convey that the GQ has one value per genotype. You can assert that though at runtime by modifying the VCF header to use "G" as the Number for GQ:
Thank you.
Riccardo
Hi I am sorry, I have tried with the NV, because with GQ I have not more values in the same line, but it does not work. After I load the vcf I use the expand function and after that this:
but it does not solve the problem.
You should call
expand()
after changing the header.I did this:
Is it rigth? Because it did not work.
Would you please explain how the result does not match your expectations?
Maybe I did not understand what your solution do. For example if I do this:
In the table t I have some duplicated rows because I used expand and this is good but they are in this format (in the case of two mutations in the same position):
I would like to have this in two different rows:
Is it possible with my data and your solution?
I see, NV has a value per ALT, not per genotype, so you need it to be "A" in the header, not "G".
Thank you but I have this problem:
Maybe is not possible and I have to build a specific function for this case.
I did not get the same error as you, but I did get an error, so I fixed it. It will arrive in a day or so with version 1.20.1 of VariantAnnotation.
Thank you very much.