Question: Help creating VRange object
0
gravatar for maisapinheiro12
6 months ago by
maisapinheiro120 wrote:

Hello,

I am trying to run SomaticSignatures package but I can't get the input file (a VRange object) right.

In the examples given by their vignette is this:

>sca_vr = VRanges(
  seqnames = seqnames(sca_data),
  ranges = ranges(sca_data),
  ref = sca_data$Reference_Allele,
  alt = sca_data$Tumor_Seq_Allele2,
  sampleNames = sca_data$Patient_ID,
  seqinfo = seqinfo(sca_data),
  study = sca_data$study)

But when I try to use exactly the same code with my data, that I pulled from a vcf example, provided but the VariantAnnotation package, I am unable to obtain the same VRange object:

> test_range=VRanges(
+   seqnames = seqnames(vcfRange),
+   ranges = ranges(vcfRange),
+   ref = vcfRange$REF,
+   alt = vcfRange$ALT)
Error in as.vector(x, mode = "character") : 
  no method for coercing this S4 class to a vector

This is how I obtained the vcf example:

>library(VariantAnnotation) #load the package
>fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation") #folder
>vcf <- readVcf(fl, "hg19") #input vcf
>vcfRange=rowRanges(vcf)

Can anyone help me to figure this out? Thanks

somaticsignatures vcf vrange • 133 views
ADD COMMENTlink modified 6 months ago by Julian Gehring1.3k • written 6 months ago by maisapinheiro120
Answer: Help creating VRange object
1
gravatar for James W. MacDonald
6 months ago by
United States
James W. MacDonald51k wrote:

From ?VRanges-class

Details:

     VRanges extends GRanges to store the following components. Except
     where noted, the components are considered columns in the dataset,
     i.e., their lengths match the number of variants. Many columns can
     be stored as either an atomic vector or an Rle.

     'ref' ('character'), the reference allele. The range
          (start/end/width) should always correspond to this sequence.

     'alt' ('character/Rle'), the alternative allele (NA allowed). By
          convention there is only a single alt allele per element
          (row) of the VRanges. Many methods, like 'match', make this
          assumption.

And

> vcfRange$ALT
DNAStringSetList of length 10376
[[1]] G
[[2]] T
[[3]] A
[[4]] T
[[5]] T
[[6]] A
[[7]] C
[[8]] A
[[9]] A
[[10]] C
...
<10366 more elements>

This is a DNAStringSetList because there is more than one alternate allele for at least one location, and you can only have one alternate allele per location.

ADD COMMENTlink written 6 months ago by James W. MacDonald51k

Yeas... I had noticed that. There is a list instead of a vector in the ALT allele, but I didn't know why exactly. I'll try to extract this as a vector using only the first one. Thanks

ADD REPLYlink written 6 months ago by maisapinheiro120

Thank you so much. It worked with unlist(vcfRange$ALT)

ADD REPLYlink written 6 months ago by maisapinheiro120

Thank you so much. It worked with unlist(vcfRange$ALT)

ADD REPLYlink written 6 months ago by maisapinheiro120

Yes, that works in this instance, where the DNAStringSetList for the ALT allele has only one allele per position. But if that isn't true, then it doesn't work

> z <- vcfRange$ALT
> z[1] <- DNAStringSetList(c("G", "T"))
> vcfRange$ALT <- z
> test_range=VRanges(
   seqnames = seqnames(vcfRange),
   ranges = ranges(vcfRange),
   ref = vcfRange$REF,
   alt = unlist(vcfRange$ALT))
Error in validObject(.Object) : 
  invalid class "VRanges" object: 'ref' must not match 'alt'
In addition: Warning messages:
1: In ans[] <- x :
  number of items to replace is not a multiple of replacement length
2: In ans[] <- x :
  number of items to replace is not a multiple of replacement length

A (probably naive) way to get around that is to just use the first element in any list item that is longer than 1. There are most assuredly better ways of doing this, (and probably Martin Morgan or Michael Lawrence will be along shortly to show how), but a quick solution is

> ind <- lengths(vcfRange$ALT)
> ind2 <- rep(seq_len(length(ind)), ind)
> vcfRange$ALT <- unlist(vcfRange$ALT)[!duplicated(ind2)]
> test_range=VRanges(
   seqnames = seqnames(vcfRange),
   ranges = ranges(vcfRange),
   ref = vcfRange$REF,
   alt = vcfRange$ALT)
> test_range
VRanges object with 10376 ranges and 0 metadata columns:
              seqnames    ranges strand         ref              alt
                 <Rle> <IRanges>  <Rle> <character> <characterOrRle>
    rs7410291       22  50300078      *           A                G
  rs147922003       22  50300086      *           C                T
  rs114143073       22  50300101      *           G                A
  rs141778433       22  50300113      *           C                T
  rs182170314       22  50300166      *           C                T
          ...      ...       ...    ...         ...              ...
  rs187302552       22  50999536      *           A                G
    rs9628178       22  50999538      *           A                G
    rs5770892       22  50999681      *           A                G
  rs144055359       22  50999830      *           G                A
  rs114526001       22  50999964      *           G                C
                  totalDepth       refDepth       altDepth   sampleNames
              <integerOrRle> <integerOrRle> <integerOrRle> <factorOrRle>
    rs7410291           <NA>           <NA>           <NA>          <NA>
  rs147922003           <NA>           <NA>           <NA>          <NA>
  rs114143073           <NA>           <NA>           <NA>          <NA>
  rs141778433           <NA>           <NA>           <NA>          <NA>
  rs182170314           <NA>           <NA>           <NA>          <NA>
          ...            ...            ...            ...           ...
  rs187302552           <NA>           <NA>           <NA>          <NA>
    rs9628178           <NA>           <NA>           <NA>          <NA>
    rs5770892           <NA>           <NA>           <NA>          <NA>
  rs144055359           <NA>           <NA>           <NA>          <NA>
  rs114526001           <NA>           <NA>           <NA>          <NA>
              softFilterMatrix
                      <matrix>
    rs7410291                 
  rs147922003                 
  rs114143073                 
  rs141778433                 
  rs182170314                 
          ...              ...
  rs187302552                 
    rs9628178                 
    rs5770892                 
  rs144055359                 
  rs114526001                 
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
  hardFilters: NULL
ADD REPLYlink written 6 months ago by James W. MacDonald51k
Answer: Help creating VRange object
0
gravatar for maisapinheiro12
6 months ago by
maisapinheiro120 wrote:

Thank you so much! It worked with

unlist(vcfRange$ALT)

ADD COMMENTlink written 6 months ago by maisapinheiro120
Answer: Help creating VRange object
0
gravatar for Julian Gehring
6 months ago by
Julian Gehring1.3k
Julian Gehring1.3k wrote:

The simplest way for getting a VRanges object from a VCF file would be to use the readVcfAsVRanges function instead of the readVcf function in your example. The two functions provide the same functionalities and parameters, but the readVcfAsVRanges takes care of a lot of the conversions steps that you currently try to do manually.

ADD COMMENTlink written 6 months ago by Julian Gehring1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 480 users visited in the last hour