VariantAnnotation geno function
1
0
Entering edit mode
ribioinfo ▴ 100
@ribioinfo-9434
Last seen 3.6 years ago

Hi I have this output from geno():

head(geno(v)$GQ)
               1 2 3 4 5 6 7
chr1:10443_C/T 24       3       3       0          0          0          0     
chr1:12783_G/A 3        6       7       0          3          0          5     
chr1:12839_G/C 5        10      87      21         3          12         57    
chr1:12882_C/G 63       24      39      29         18         39         21    
chr1:13012_G/A 99       99      99      99         99         69         99    
chr1:13079_C/G 99       99      99      99         99         99         98

I tried to convert it in data frame:

t = as.data.frame(geno(v)$GQ)

and it seems ok, but I would like to give to my function one row at time so I used this command:

apply(t[1,], 1, function(x)x)
$`chr1:10443_C/T`
$`chr1:10443_C/T`$1
[1] 24

$`chr1:10443_C/T`$2
[1] 3

$`chr1:10443_C/T`$3
[1] 3

$`chr1:10443_C/T`$4
[1] 0

$`chr1:10443_C/T`$5
[1] 0

$`chr1:10443_C/T`$6
[1] 0

$`chr1:10443_C/T`$7
[1] 0

While I would like to have: 

24       3       3       0          0          0          0 

that is the first row. I think that the problem is the data frame created using the output object of the function geno(), could you help me?

Thank you.

Riccardo

variantannotation • 2.5k views
ADD COMMENT
0
Entering edit mode

What are you trying to do to each row of GQ?

ADD REPLY
0
Entering edit mode

I would like to filter out some variants e.g. GQ >=10 in at least 3 samples

ADD REPLY
0
Entering edit mode

This is the row:

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    1    2    3    4    5    6    7
chr1    15274    .    A    G,T    906    MQ    BRF=0.58;FR=0.2250,0.7750;HP=1;HapScore=3;MGOF=27;MMLQ=28;MQ=21.07;NF=0,0;NR=4,35;PP=1,906;QD=8.9428031368;SC=ACCACTGTACAATGGGGAAAC;SbPval=1.0;Source=File;TC=39;TCF=0;TCR=39;TR=4,35;WE=15282;WS=15264    GT:GL:GOF:GQ:NR:NV    1/2:-1,-1,-1:13:10:7,7:1,6    2/2:-1,-1,-1:27:5:3,3:0,3    2/2:-1,-1,-1:22:15:7,7:0,7    1/2:-1,-1,-1:26:32:12,12:2,10    1/2:-1,-1,-1:16:13:4,4:1,3    0/1:-1,-1,-1:0:1:0,0:0,0    2/2:-1,-1,-1:24:12:6,6:0,6
ADD REPLY
0
Entering edit mode

How about the complete header of the file?

ADD REPLY
0
Entering edit mode

It is too big I cannot paste here.

ADD REPLY
0
Entering edit mode

How about using pastebin or something?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I see, the problem is that the file is using the ancient 4.0 version of the spec, and has no way to convey that the GQ has one value per genotype. You can assert that though at runtime by modifying the VCF header to use "G" as the Number for GQ:

geno(header(vcf))["GQ", "Number"] <- "G"
ADD REPLY
0
Entering edit mode

Thank you.

Riccardo

ADD REPLY
0
Entering edit mode

Hi I am sorry, I have tried with the NV, because with GQ I have not more values in the same line, but it does not work. After I load the vcf I use the expand function and after that this: 

geno(header(vcf))["NV", "Number"] <- "G"

but it does not solve the problem.

ADD REPLY
0
Entering edit mode

You should call expand() after changing the header.

ADD REPLY
0
Entering edit mode

I did this:

v = readVcf(f, "hg19")

geno(header(v))["NV", "Number"] = "G"

v = expand(v)

t = geno(v)$NV

Is it rigth? Because it did not work.

ADD REPLY
0
Entering edit mode

Would you please explain how the result does not match your expectations?

ADD REPLY
0
Entering edit mode

Maybe I did not understand what your solution do. For example if I do this:

v = readVcf(f, "hg19")

v = expand(v)

t = geno(v)$NV

In the table t I have some duplicated rows because I used expand and this is good but they are in this format (in the case of two mutations in the same position):

c(x1, y1).....c(xn, yn)

c(x1, y1).....c(xn, yn)

I would like to have this in two different rows:

x1,....,xn

y1,....,yn

Is it possible with my data and your solution?

 

ADD REPLY
0
Entering edit mode

I see, NV has a value per ALT, not per genotype, so you need it to be "A" in the header, not "G".

ADD REPLY
0
Entering edit mode

Thank you but I have this problem:

v = readVcf(f, "hg19")

geno(header(v))["NV", "Number"] = "A"

v = expand(v)

Error in FUN(X[[i]], ...) : invalid 'times' value

Maybe is not possible and I have to build a specific function for this case.

ADD REPLY
0
Entering edit mode

I did not get the same error as you, but I did get an error, so I fixed it. It will arrive in a day or so with version 1.20.1 of VariantAnnotation.

ADD REPLY
0
Entering edit mode

Thank you very much.

ADD REPLY
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.3 years ago
United States

It's probably simplest to operate on the matrix in vectorized fashion. For example,

keep <- rowSums(geno(v)$GQ >= 10) >= 3
v[keep,]

 

ADD COMMENT
0
Entering edit mode

Thank you it works but for some kind positions I have more variants and instead to have one numerical value in the table I have e.g c(7, 7) because in this position I have two variants and using your solution I get this:

Error in rowSums(geno(v)$NR >= NR) : 
  (list) object cannot be coerced to type 'double'

Could you help me?

Thank you.

Riccardo 

ADD REPLY
0
Entering edit mode

Simplest approach is just to expand the Vcf object first, so that there is one row per alt:

v2 <- expand(v)
ADD REPLY
0
Entering edit mode

I tried expand but the result is that I have c(7, 7) not only in one row but in two rows, it expands the vcf but the vector now is two rows.

 

ADD REPLY
0
Entering edit mode

That seems like a bug to me. Could you make the VCF file available somehow? It would be enough to have the header and the offending row.

ADD REPLY

Login before adding your answer.

Traffic: 634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6