Search
Question: Annotation for CNVs
0
gravatar for Haiying.Kong
12 months ago by
Haiying.Kong100
Germany
Haiying.Kong100 wrote:

Is there any Bioconductor tool for annotating CNVs? I could not find any.

ADD COMMENTlink modified 12 months ago by Martin Morgan ♦♦ 21k • written 12 months ago by Haiying.Kong100
2
gravatar for Martin Morgan
12 months ago by
Martin Morgan ♦♦ 21k
United States
Martin Morgan ♦♦ 21k wrote:

How do you want to 'annotate' your CNVs? Maybe rtracklayer::import() and GenomicRanges::findOverlaps() are sufficient.

ADD COMMENTlink written 12 months ago by Martin Morgan ♦♦ 21k

Thank you so much  :)

 

ADD REPLYlink written 12 months ago by Haiying.Kong100

so do you know how to do this now?

ADD REPLYlink written 11 months ago by fjpt.com0

The goal is to make a GRanges object that contains your copy number regions, a a GRanges object that contains your annotations.

For the first, maybe you have a plain text file with chromosome, start, end, strand information. You could read that in to R as a data.frame, then use

cnv = GenomicRanges::makeGRangesFromDataFrame()

For the second, the annotations may be available in a 'TxDb' package, or in a gtf (maybe from Ensembl and available through AnnotationHub, maybe from some other source). If from a gtf file, I'd suggest creating a TxDb object

txdb = makeTxDbFromGFF()  # see other makeTxDb

For use of AnnotationHub, see it's vignette. For use of TxDb packages, the flow is to install the package via biocLite() then use it

library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb = TxDb.Hsapiens.UCSC.hg38.knownGene

For either txdb, I would then extract the GRanges corresponding to the features I wish to annotate, e.g.,

genes = genes(txdb)

find the overlaps between the 'query' cnv ranges and the 'subject' genes

olaps = GenomicRanges::findOverlaps(cnv, genes)

Create a 'long form' version of the data

long_annotated = cnv[queryHits(olaps)]
long_annotated$gene_id = genes[subjectHits(olaps)]$gene_id

Alternatively annotate each overlap with the gene id

mcols(olaps)$gene_id = genes$gene_id[subjectHits(olaps)]

and summarize the genes in each cnv

cnv_factor = factor(queryHits(olaps), levels=seq_len(queryHits(olaps)))
cvn$gene_id = IRanges::splitAsList(mcols(olaps)$gene_id, cnv_factor)

A function summarizing these steps is

annotateCnvs <-
    function(cnv, txdb)
{
    stopifnot(is(cnv, "GRanges"), is(txdb, "TxDb"))
    anno = genes(txdb)
    olaps = findOverlaps(cnv, anno)
    mcols(olaps)$gene_id = genes$gene_id[subjectHits(olaps)]
    cnv_factor = factor(queryHits(olaps), levels=seq_len(queryLength(olaps)))
    cnv$gene_id = IRanges::splitAsList(mcols(olaps)$gene_id, cnv_factor)
    cnv
}        

If I had a couple of copy number regions, created 'by hand' rather than using makeGRangesFrom... or rtracklayer::import()

cnv = GRanges(c("chr1:10000000-20000000", "chr20:1-1000000"))

I could get

> annotateCnvs(cnv, txdb)
GRanges object with 2 ranges and 1 metadata column:
      seqnames               ranges strand |                           gene_id
         <Rle>            <IRanges>  <Rle> |                   <CharacterList>
  [1]     chr1 [10000000, 20000000]      * | 100133301,100302276,100379251,...
  [2]    chr20 [       1,  1000000]      * |        100507459,10616,113278,...
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

The gene_ids are Entrez gene identifiers, because that is the form of the annotation in the UCSC knownGene track.

Be careful that the coordinate system of the copy number variants are the same as the coordinate system of the annotation (e.g., 1-based, open intervals).

ADD REPLYlink modified 11 months ago • written 11 months ago by Martin Morgan ♦♦ 21k

I have data from control-freec,foramt:chr start end num cnvtype(gain or loss),so how i annotate this ?

 

ADD REPLYlink written 11 months ago by fjpt.com0
0
gravatar for Haiying.Kong
12 months ago by
Haiying.Kong100
Germany
Haiying.Kong100 wrote:

Probably this is too simple to make a package, I guess.

I probably just use bedtools to compare the regions I found and gene regions that I downloaded from UCSC.

ADD COMMENTlink written 12 months ago by Haiying.Kong100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 161 users visited in the last hour