VariantAnnotation geno function
1
0
Entering edit mode
ribioinfo ▴ 90
@ribioinfo-9434
Last seen 8 months ago

Hi I have this output from geno():

head(geno(v)$GQ) 1 2 3 4 5 6 7 chr1:10443_C/T 24 3 3 0 0 0 0 chr1:12783_G/A 3 6 7 0 3 0 5 chr1:12839_G/C 5 10 87 21 3 12 57 chr1:12882_C/G 63 24 39 29 18 39 21 chr1:13012_G/A 99 99 99 99 99 69 99 chr1:13079_C/G 99 99 99 99 99 99 98 I tried to convert it in data frame: t = as.data.frame(geno(v)$GQ)

and it seems ok, but I would like to give to my function one row at time so I used this command:

apply(t[1,], 1, function(x)x)
$chr1:10443_C/T$chr1:10443_C/T$1 [1] 24$chr1:10443_C/T$2 [1] 3$chr1:10443_C/T$3 [1] 3$chr1:10443_C/T$4 [1] 0$chr1:10443_C/T$5 [1] 0$chr1:10443_C/T$6 [1] 0$chr1:10443_C/T$7 [1] 0 While I would like to have:  24 3 3 0 0 0 0  that is the first row. I think that the problem is the data frame created using the output object of the function geno(), could you help me? Thank you. Riccardo variantannotation • 961 views ADD COMMENT 0 Entering edit mode What are you trying to do to each row of GQ? ADD REPLY 0 Entering edit mode I would like to filter out some variants e.g. GQ >=10 in at least 3 samples ADD REPLY 0 Entering edit mode This is the row: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 1 2 3 4 5 6 7 chr1 15274 . A G,T 906 MQ BRF=0.58;FR=0.2250,0.7750;HP=1;HapScore=3;MGOF=27;MMLQ=28;MQ=21.07;NF=0,0;NR=4,35;PP=1,906;QD=8.9428031368;SC=ACCACTGTACAATGGGGAAAC;SbPval=1.0;Source=File;TC=39;TCF=0;TCR=39;TR=4,35;WE=15282;WS=15264 GT:GL:GOF:GQ:NR:NV 1/2:-1,-1,-1:13:10:7,7:1,6 2/2:-1,-1,-1:27:5:3,3:0,3 2/2:-1,-1,-1:22:15:7,7:0,7 1/2:-1,-1,-1:26:32:12,12:2,10 1/2:-1,-1,-1:16:13:4,4:1,3 0/1:-1,-1,-1:0:1:0,0:0,0 2/2:-1,-1,-1:24:12:6,6:0,6  ADD REPLY 0 Entering edit mode How about the complete header of the file? ADD REPLY 0 Entering edit mode It is too big I cannot paste here. ADD REPLY 0 Entering edit mode How about using pastebin or something? ADD REPLY 0 Entering edit mode ADD REPLY 0 Entering edit mode I see, the problem is that the file is using the ancient 4.0 version of the spec, and has no way to convey that the GQ has one value per genotype. You can assert that though at runtime by modifying the VCF header to use "G" as the Number for GQ: geno(header(vcf))["GQ", "Number"] <- "G" ADD REPLY 0 Entering edit mode Thank you. Riccardo ADD REPLY 0 Entering edit mode Hi I am sorry, I have tried with the NV, because with GQ I have not more values in the same line, but it does not work. After I load the vcf I use the expand function and after that this: geno(header(vcf))["NV", "Number"] <- "G" but it does not solve the problem. ADD REPLY 0 Entering edit mode You should call expand() after changing the header. ADD REPLY 0 Entering edit mode I did this: v = readVcf(f, "hg19") geno(header(v))["NV", "Number"] = "G" v = expand(v) t = geno(v)$NV

Is it rigth? Because it did not work.

0
Entering edit mode

Would you please explain how the result does not match your expectations?

0
Entering edit mode

Maybe I did not understand what your solution do. For example if I do this:

v = readVcf(f, "hg19")

v = expand(v)

t = geno(v)$NV In the table t I have some duplicated rows because I used expand and this is good but they are in this format (in the case of two mutations in the same position): c(x1, y1).....c(xn, yn) c(x1, y1).....c(xn, yn) I would like to have this in two different rows: x1,....,xn y1,....,yn Is it possible with my data and your solution? ADD REPLY 0 Entering edit mode I see, NV has a value per ALT, not per genotype, so you need it to be "A" in the header, not "G". ADD REPLY 0 Entering edit mode Thank you but I have this problem: v = readVcf(f, "hg19") geno(header(v))["NV", "Number"] = "A" v = expand(v) Error in FUN(X[[i]], ...) : invalid 'times' value Maybe is not possible and I have to build a specific function for this case. ADD REPLY 0 Entering edit mode I did not get the same error as you, but I did get an error, so I fixed it. It will arrive in a day or so with version 1.20.1 of VariantAnnotation. ADD REPLY 0 Entering edit mode Thank you very much. ADD REPLY 0 Entering edit mode @michael-lawrence-3846 Last seen 10 weeks ago United States It's probably simplest to operate on the matrix in vectorized fashion. For example, keep <- rowSums(geno(v)$GQ >= 10) >= 3
v[keep,]

0
Entering edit mode

Thank you it works but for some kind positions I have more variants and instead to have one numerical value in the table I have e.g c(7, 7) because in this position I have two variants and using your solution I get this:

Error in rowSums(geno(v)\$NR >= NR) :
(list) object cannot be coerced to type 'double'

Could you help me?

Thank you.

Riccardo

0
Entering edit mode

Simplest approach is just to expand the Vcf object first, so that there is one row per alt:

v2 <- expand(v)
0
Entering edit mode

I tried expand but the result is that I have c(7, 7) not only in one row but in two rows, it expands the vcf but the vector now is two rows.

0
Entering edit mode

That seems like a bug to me. Could you make the VCF file available somehow? It would be enough to have the header and the offending row.