Dear maintainer,
I notice a slightly strange behaviour of readVcf when ScanVcfParam(info=).
It seems that the imported VCF object somehow retains memories of the original VCF header (in the file), while the data was effectively dropped.
After import, the show method seems fine (header and INFO state that only the desired field was imported).
However, looking directly at info(header(vcf)) shows that the original fields are still there, while the info(vcf) show that only the desired field was imported. The show method must hide the extra fields based on the content of info(vcf), I guess. I have dug as far as the .showVCFSubclass method, but that's where it goes a little too decipher to read for me.
Many thanks,
Kevin
Let us take the example (from the vignette, plus a ScanVcfParam statement):
> svp <- ScanVcfParam(info = "NS")
> fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
> vcf2 <- readVcf(fl, "hg19", param = svp)
# Object seems fine (see info(vcf) and info(header(vcf))
> vcf2
class: CollapsedVCF
dim: 5 3
rowRanges(vcf):
GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
info(vcf):
DataFrame with 1 column: NS
info(header(vcf)):
Number Type Description
NS 1 Integer Number of Samples With Data
geno(vcf):
SimpleList of length 4: GT, GQ, DP, HQ
geno(header(vcf)):
Number Type Description
GT 1 String Genotype
GQ 1 Integer Genotype Quality
DP 1 Integer Read Depth
HQ 2 Integer Haplotype Quality
# Headers of non-imported fields are still present: needs fix (?)
> info(header(vcf2))
DataFrame with 6 rows and 3 columns
Number Type Description
<character> <character> <character>
NS 1 Integer Number of Samples With Data
DP 1 Integer Total Depth
AF A Float Allele Frequency
AA 1 String Ancestral Allele
DB 0 Flag dbSNP membership, build 129
H2 0 Flag HapMap2 membership
# Only the desired data was imported: good
> info(vcf2)
DataFrame with 5 rows and 1 column
NS
<integer>
rs6054257 3
20:17330_T/A 3
rs6040355 2
20:1230237_T/. 3
microsat1 3

Thank you for noting this discrepancy. The current implementation of displaying the original header information rather than the subset is deliberate in design. We felt that knowing all the possible information available from the object could be beneficial. Also some fields are closely related and it might be useful to know one or the other existed even if it was dropped.
Thank you. I suspected so, and it makes sense. In my use case however, I prefer to keep header and data synchronised, for a few reasons:
^_^)Should you post this reply as an answer, if there is no plan to change
VariantAnnotation? (I really don't mean this in a bad way, as I wrote above, it makes sense).For my use case, I simply wrote an extra line of code to remove from info(header) the rows without data in info(). Not a big deal.
Thanks again for the clarification!
Kevin