Question

VariantAnnotation geno function

0

Entering edit mode

ribioinfo ▴ 100

@ribioinfo-9434

Last seen 4.7 years ago

Hi I have this output from geno():

head(geno(v)$GQ)
               1 2 3 4 5 6 7
chr1:10443_C/T 24       3       3       0          0          0          0     
chr1:12783_G/A 3        6       7       0          3          0          5     
chr1:12839_G/C 5        10      87      21         3          12         57    
chr1:12882_C/G 63       24      39      29         18         39         21    
chr1:13012_G/A 99       99      99      99         99         69         99    
chr1:13079_C/G 99       99      99      99         99         99         98

I tried to convert it in data frame:

t = as.data.frame(geno(v)$GQ)

and it seems ok, but I would like to give to my function one row at time so I used this command:

apply(t[1,], 1, function(x)x)

$`chr1:10443_C/T`
$`chr1:10443_C/T`$1
[1] 24

$`chr1:10443_C/T`$2
[1] 3

$`chr1:10443_C/T`$3
[1] 3

$`chr1:10443_C/T`$4
[1] 0

$`chr1:10443_C/T`$5
[1] 0

$`chr1:10443_C/T`$6
[1] 0

$`chr1:10443_C/T`$7
[1] 0

While I would like to have:

24       3       3       0          0          0          0

that is the first row. I think that the problem is the data frame created using the output object of the function geno(), could you help me?

Thank you.

Riccardo

variantannotation • 3.3k views

ADD COMMENT • link 8.5 years ago ribioinfo ▴ 100

0

Entering edit mode

What are you trying to do to each row of GQ?

ADD REPLY • link 8.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

I would like to filter out some variants e.g. GQ >=10 in at least 3 samples

ADD REPLY • link 8.5 years ago ribioinfo ▴ 100

0

Entering edit mode

This is the row:

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    1    2    3    4    5    6    7
chr1    15274    .    A    G,T    906    MQ    BRF=0.58;FR=0.2250,0.7750;HP=1;HapScore=3;MGOF=27;MMLQ=28;MQ=21.07;NF=0,0;NR=4,35;PP=1,906;QD=8.9428031368;SC=ACCACTGTACAATGGGGAAAC;SbPval=1.0;Source=File;TC=39;TCF=0;TCR=39;TR=4,35;WE=15282;WS=15264    GT:GL:GOF:GQ:NR:NV    1/2:-1,-1,-1:13:10:7,7:1,6    2/2:-1,-1,-1:27:5:3,3:0,3    2/2:-1,-1,-1:22:15:7,7:0,7    1/2:-1,-1,-1:26:32:12,12:2,10    1/2:-1,-1,-1:16:13:4,4:1,3    0/1:-1,-1,-1:0:1:0,0:0,0    2/2:-1,-1,-1:24:12:6,6:0,6

ADD REPLY • link 8.5 years ago ribioinfo ▴ 100

0

Entering edit mode

How about the complete header of the file?

ADD REPLY • link 8.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

It is too big I cannot paste here.

ADD REPLY • link 8.5 years ago ribioinfo ▴ 100

0

Entering edit mode

How about using pastebin or something?

ADD REPLY • link 8.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

http://pastie.org/private/

ADD REPLY • link 8.5 years ago ribioinfo ▴ 100

0

Entering edit mode

I see, the problem is that the file is using the ancient 4.0 version of the spec, and has no way to convey that the GQ has one value per genotype. You can assert that though at runtime by modifying the VCF header to use "G" as the Number for GQ:

geno(header(vcf))["GQ", "Number"] <- "G"

ADD REPLY • link 8.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thank you.

Riccardo

ADD REPLY • link 8.5 years ago ribioinfo ▴ 100

0

Entering edit mode

Hi I am sorry, I have tried with the NV, because with GQ I have not more values in the same line, but it does not work. After I load the vcf I use the expand function and after that this:

geno(header(vcf))["NV", "Number"] <- "G"

but it does not solve the problem.

ADD REPLY • link 8.4 years ago ribioinfo ▴ 100

0

Entering edit mode

You should call expand() after changing the header.

ADD REPLY • link 8.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

I did this:

v = readVcf(f, "hg19")

geno(header(v))["NV", "Number"] = "G"

v = expand(v)

t = geno(v)$NV

Is it rigth? Because it did not work.

ADD REPLY • link 8.4 years ago ribioinfo ▴ 100

0

Entering edit mode

Would you please explain how the result does not match your expectations?

ADD REPLY • link 8.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Maybe I did not understand what your solution do. For example if I do this:

v = readVcf(f, "hg19")

v = expand(v)

t = geno(v)$NV

In the table t I have some duplicated rows because I used expand and this is good but they are in this format (in the case of two mutations in the same position):

c(x1, y1).....c(xn, yn)

c(x1, y1).....c(xn, yn)

I would like to have this in two different rows:

x1,....,xn

y1,....,yn

Is it possible with my data and your solution?

ADD REPLY • link 8.4 years ago ribioinfo ▴ 100

0

Entering edit mode

I see, NV has a value per ALT, not per genotype, so you need it to be "A" in the header, not "G".

ADD REPLY • link 8.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thank you but I have this problem:

v = readVcf(f, "hg19")

geno(header(v))["NV", "Number"] = "A"

v = expand(v)

Error in FUN(X[[i]], ...) : invalid 'times' value

Maybe is not possible and I have to build a specific function for this case.

ADD REPLY • link 8.4 years ago ribioinfo ▴ 100

0

Entering edit mode

I did not get the same error as you, but I did get an error, so I fixed it. It will arrive in a day or so with version 1.20.1 of VariantAnnotation.

ADD REPLY • link 8.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thank you very much.

ADD REPLY • link 8.4 years ago ribioinfo ▴ 100

score 0 · Answer 1 · 2016-11-03

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 3.4 years ago

United States

It's probably simplest to operate on the matrix in vectorized fashion. For example,

keep <- rowSums(geno(v)$GQ >= 10) >= 3
v[keep,]

ADD COMMENT • link 8.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thank you it works but for some kind positions I have more variants and instead to have one numerical value in the table I have e.g c(7, 7) because in this position I have two variants and using your solution I get this:

Error in rowSums(geno(v)$NR >= NR) : 
  (list) object cannot be coerced to type 'double'

Could you help me?

Thank you.

Riccardo

ADD REPLY • link 8.5 years ago ribioinfo ▴ 100

0

Entering edit mode

Simplest approach is just to expand the Vcf object first, so that there is one row per alt:

v2 <- expand(v)

ADD REPLY • link 8.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

I tried expand but the result is that I have c(7, 7) not only in one row but in two rows, it expands the vcf but the vector now is two rows.

ADD REPLY • link 8.5 years ago ribioinfo ▴ 100

0

Entering edit mode

That seems like a bug to me. Could you make the VCF file available somehow? It would be enough to have the header and the offending row.

ADD REPLY • link 8.5 years ago Michael Lawrence ★ 11k