CollapsedVCF and ExpandedVCF rownames and vcfs with large ref/alt columns
1
0
Entering edit mode
@sean-davis-490
Last seen 12 weeks ago
United States

With some newer VCF files, particularly those that have come from structural variant callers, the rownames of the resulting `VCF` objects (when the ID column is missing) can become problematic or excessively large. To give a concrete example, we have VCFs with insertions of dozens or hundreds of bases that lead to very long rownames. While human-readable rownames are useful in some cases, would it be possible to maintain uniqueness while enforcing some further constraints on the constructed rownames to keep them manageably short and printable?   

variantannotation VCF • 1.4k views
ADD COMMENT
1
Entering edit mode

Or perhaps also have an option to drop rownames altogether...

ADD REPLY
2
Entering edit mode
@valerie-obenchain-4275
Last seen 2.8 years ago
United States

Yes, I can imagine that's the case. The pasting of CHROM:POS_REF/ALT was a reasonable solution when ALT was just a few bases. I'm open to other suggestions of how to handle this - what information would you like to see as the row names?

We do have the option to turn off rownames with readVcf(..., row.names=FALSE).

Val

ADD COMMENT
0
Entering edit mode

Thanks, Val.  row.names=FALSE is a good solution for our use case. If unique IDs are of interest, hashing approaches would be potentially interesting to folks, but perhaps that is best left as a user-level decision. Part of my rationale for asking was to see what others thought. 

ADD REPLY

Login before adding your answer.

Traffic: 964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6