Search
Question: about collapsed and expanded vcf file
0
21 months ago by
Bogdan520
Palo Alto, CA, USA
Bogdan520 wrote:

Dear all, talking about the vcf files in R/BioC:

please could someone tell me more about the differences between a "collapsed VCF" and an "expanded VCF" file, and which functions/methods work apply to collapsed vs expanded VCF files ?

thank you,

-- bogdan

modified 20 months ago by raggieapauly0 • written 21 months ago by Bogdan520
3
21 months ago by
Valerie Obenchain ♦♦ 6.6k
United States
Valerie Obenchain ♦♦ 6.6k wrote:

The ?VCF man page explains the difference between Collapsed and Expanded VCF classes. Both extend the VCF class (use showClass("CollapsedVCF")) so methods set on the VCF class are inherited by both.

To see what classes a method operates on use showMethods(), e.g.,

> showMethods("alt")
Function: alt (package VariantAnnotation)
x="VCF"
x="VRanges"

> showMethods("expand")
Function: expand (package S4Vectors)
x="CollapsedVCF"
x="DataFrame"
x="ExpandedVCF"
x="Vector"

Valerie

ADD COMMENTlink modified 21 months ago • written 21 months ago by Valerie Obenchain ♦♦ 6.6k

Dear Valerie, thank you for your time and the information. Please, when you have a minute, it would be really helpful to have an concrete example of how collapsed or expanded vcf file influence the data processing, or the use of functions or methods on the data. Sorry for being slow, and many thanks !

1

The ExpandedVCF class is a flat-ish form of the CollapsedVCF. The expansion is centered around the ALT field. Often there can be more than one ALT value per variant and in the CollapsedVCF all ALT values for a single variant are presented in a single row. While there is one row per genomic position, the row actually represents multiple REF / ALT pairs which creates a somewhat nested view of the data. To flatten this out, or have one REF / ALT pair per row, you can expand() a CollapsedVCF object or call readVcf(..., collapsed=FALSE). In this expanded form the 'AD' genotype field is also expanded into REF/ALT pairs and all other fields are simply replicated out.

Another flat form of variant data is the VRanges class. This is a GRanges with the info and geno fields as metadata columns instead of separate slots. The class is less complex than the VCF class and may be more useful for analysis.

As for concrete examples of how collapsed or expanded vcf files influence data processing there are a number examples on the man page under 'Collapsed and Expanded VCF', e.g.,

## ----------------------------------------------------------------
## Collapsed and Expanded VCF
## ----------------------------------------------------------------
## readVCF() produces a CollapsedVCF object.
fl <- system.file("extdata", "ex2.vcf",
package="VariantAnnotation")
vcf <- readVcf(fl, genome="hg19")
vcf

## The ALT column is a DNAStringSetList to allow for more
## than one alternate allele per variant.
alt(vcf)

## For structural variants ALT is a CharacterList.
fl <- system.file("extdata", "structural.vcf",
package="VariantAnnotation")
vcf <- readVcf(fl, genome="hg19")
alt(vcf)

## ExpandedVCF is the 'flattened' counterpart of CollapsedVCF.
## The ALT and all variables with Number='A' in the header are
## expanded to one row per alternate allele.
vcfLong <- expand(vcf)
alt(vcfLong)

If you have a specific analysis in mind it may be more productive to post that as a question. Then others can contribute what they have tried, what classes and methods worked for them, etc.

Valerie

ADD REPLYlink modified 21 months ago • written 21 months ago by Valerie Obenchain ♦♦ 6.6k

Dear Valerie, thank you for your quick reply and very comprehensive answer : will go slowly through the examples that you offered, and will let you know shall I have any comment or additional tiny question. happy and fruitful week (hope not too cold ;) !

0
21 months ago by
Bogdan520
Palo Alto, CA, USA
Bogdan520 wrote:

Dear Valerie, thank you again for all your answers and guidance  with the examples : it is very very helpful ;)

0
20 months ago by
raggieapauly0 wrote:

The vcf file is really just a slightly complicated text file. You could probably do it with vba with a few trial and error attempts. Or if you
have outlook, have Access put them in the contacts area (which you may have already done). and then from outlook, select them and do a save as vcf. Click here for getting more info.