about collapsed and expanded vcf file
3
0
Entering edit mode
Bogdan ▴ 670
@bogdan-2367
Last seen 6 months ago
Palo Alto, CA, USA

Dear all, talking about the vcf files in R/BioC:

please could someone tell me more about the differences between a "collapsed VCF" and an "expanded VCF" file, and which functions/methods work apply to collapsed vs expanded VCF files ?

thank you, 

 

-- bogdan

 

vcf • 4.2k views
ADD COMMENT
3
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States

The ?VCF man page explains the difference between Collapsed and Expanded VCF classes. Both extend the VCF class (use showClass("CollapsedVCF")) so methods set on the VCF class are inherited by both.

To see what classes a method operates on use showMethods(), e.g., 

> showMethods("alt")
Function: alt (package VariantAnnotation)
x="VCF"
x="VRanges"

> showMethods("expand")
Function: expand (package S4Vectors)
x="CollapsedVCF"
x="DataFrame"
x="ExpandedVCF"
x="Vector"

Valerie

ADD COMMENT
0
Entering edit mode

Dear Valerie, thank you for your time and the information. Please, when you have a minute, it would be really helpful to have an concrete example of how collapsed or expanded vcf file influence the data processing, or the use of functions or methods on the data. Sorry for being slow, and many thanks ! 

ADD REPLY
1
Entering edit mode

The ExpandedVCF class is a flat-ish form of the CollapsedVCF. The expansion is centered around the ALT field. Often there can be more than one ALT value per variant and in the CollapsedVCF all ALT values for a single variant are presented in a single row. While there is one row per genomic position, the row actually represents multiple REF / ALT pairs which creates a somewhat nested view of the data. To flatten this out, or have one REF / ALT pair per row, you can expand() a CollapsedVCF object or call readVcf(..., collapsed=FALSE). In this expanded form the 'AD' genotype field is also expanded into REF/ALT pairs and all other fields are simply replicated out.

Another flat form of variant data is the VRanges class. This is a GRanges with the info and geno fields as metadata columns instead of separate slots. The class is less complex than the VCF class and may be more useful for analysis.

As for concrete examples of how collapsed or expanded vcf files influence data processing there are a number examples on the man page under 'Collapsed and Expanded VCF', e.g.,

## ----------------------------------------------------------------
## Collapsed and Expanded VCF 
## ----------------------------------------------------------------
## readVCF() produces a CollapsedVCF object.
fl <- system.file("extdata", "ex2.vcf", 
                       package="VariantAnnotation")
vcf <- readVcf(fl, genome="hg19")
vcf
     
## The ALT column is a DNAStringSetList to allow for more
## than one alternate allele per variant.
alt(vcf)
     
## For structural variants ALT is a CharacterList.
fl <- system.file("extdata", "structural.vcf", 
                       package="VariantAnnotation")
vcf <- readVcf(fl, genome="hg19")
alt(vcf)
     
## ExpandedVCF is the 'flattened' counterpart of CollapsedVCF.
## The ALT and all variables with Number='A' in the header are
## expanded to one row per alternate allele.
vcfLong <- expand(vcf)
alt(vcfLong)

If you have a specific analysis in mind it may be more productive to post that as a question. Then others can contribute what they have tried, what classes and methods worked for them, etc.

Valerie

 

 

ADD REPLY
0
Entering edit mode

Dear Valerie, thank you for your quick reply and very comprehensive answer : will go slowly through the examples that you offered, and will let you know shall I have any comment or additional tiny question. happy and fruitful week (hope not too cold ;) !

ADD REPLY
0
Entering edit mode
Bogdan ▴ 670
@bogdan-2367
Last seen 6 months ago
Palo Alto, CA, USA

Dear Valerie, thank you again for all your answers and guidance  with the examples : it is very very helpful ;)

ADD COMMENT
0
Entering edit mode
@raggieapauly-12239
Last seen 7.3 years ago

The vcf file is really just a slightly complicated text file. You could probably do it with vba with a few trial and error attempts. Or if you
have outlook, have Access put them in the contacts area (which you may have already done). and then from outlook, select them and do a save as vcf. Click here for getting more info.

 

ADD COMMENT

Login before adding your answer.

Traffic: 557 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6