Search
Question: Annotation for CNVs
0
gravatar for Haiying.Kong
5 months ago by
Haiying.Kong90
Germany
Haiying.Kong90 wrote:

Is there any Bioconductor tool for annotating CNVs? I could not find any.

ADD COMMENTlink modified 5 months ago by Martin Morgan ♦♦ 20k • written 5 months ago by Haiying.Kong90
2
gravatar for Martin Morgan
5 months ago by
Martin Morgan ♦♦ 20k
United States
Martin Morgan ♦♦ 20k wrote:

How do you want to 'annotate' your CNVs? Maybe rtracklayer::import() and GenomicRanges::findOverlaps() are sufficient.

ADD COMMENTlink written 5 months ago by Martin Morgan ♦♦ 20k

Thank you so much  :)

 

ADD REPLYlink written 5 months ago by Haiying.Kong90

so do you know how to do this now?

ADD REPLYlink written 4 months ago by fjpt.com0

The goal is to make a GRanges object that contains your copy number regions, a a GRanges object that contains your annotations.

For the first, maybe you have a plain text file with chromosome, start, end, strand information. You could read that in to R as a data.frame, then use

cnv = GenomicRanges::makeGRangesFromDataFrame()

For the second, the annotations may be available in a 'TxDb' package, or in a gtf (maybe from Ensembl and available through AnnotationHub, maybe from some other source). If from a gtf file, I'd suggest creating a TxDb object

txdb = makeTxDbFromGFF()  # see other makeTxDb

For use of AnnotationHub, see it's vignette. For use of TxDb packages, the flow is to install the package via biocLite() then use it

library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb = TxDb.Hsapiens.UCSC.hg38.knownGene

For either txdb, I would then extract the GRanges corresponding to the features I wish to annotate, e.g.,

genes = genes(txdb)

find the overlaps between the 'query' cnv ranges and the 'subject' genes

olaps = GenomicRanges::findOverlaps(cnv, genes)

Create a 'long form' version of the data

long_annotated = cnv[queryHits(olaps)]
long_annotated$gene_id = genes[subjectHits(olaps)]$gene_id

Alternatively annotate each overlap with the gene id

mcols(olaps)$gene_id = genes$gene_id[subjectHits(olaps)]

and summarize the genes in each cnv

cnv_factor = factor(queryHits(olaps), levels=seq_len(queryHits(olaps)))
cvn$gene_id = IRanges::splitAsList(mcols(olaps)$gene_id, cnv_factor)

A function summarizing these steps is

annotateCnvs <-
    function(cnv, txdb)
{
    stopifnot(is(cnv, "GRanges"), is(txdb, "TxDb"))
    anno = genes(txdb)
    olaps = findOverlaps(cnv, anno)
    mcols(olaps)$gene_id = genes$gene_id[subjectHits(olaps)]
    cnv_factor = factor(queryHits(olaps), levels=seq_len(queryLength(olaps)))
    cnv$gene_id = IRanges::splitAsList(mcols(olaps)$gene_id, cnv_factor)
    cnv
}        

If I had a couple of copy number regions, created 'by hand' rather than using makeGRangesFrom... or rtracklayer::import()

cnv = GRanges(c("chr1:10000000-20000000", "chr20:1-1000000"))

I could get

> annotateCnvs(cnv, txdb)
GRanges object with 2 ranges and 1 metadata column:
      seqnames               ranges strand |                           gene_id
         <Rle>            <IRanges>  <Rle> |                   <CharacterList>
  [1]     chr1 [10000000, 20000000]      * | 100133301,100302276,100379251,...
  [2]    chr20 [       1,  1000000]      * |        100507459,10616,113278,...
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

The gene_ids are Entrez gene identifiers, because that is the form of the annotation in the UCSC knownGene track.

Be careful that the coordinate system of the copy number variants are the same as the coordinate system of the annotation (e.g., 1-based, open intervals).

ADD REPLYlink modified 4 months ago • written 4 months ago by Martin Morgan ♦♦ 20k

I have data from control-freec,foramt:chr start end num cnvtype(gain or loss),so how i annotate this ?

 

ADD REPLYlink written 4 months ago by fjpt.com0
0
gravatar for Haiying.Kong
5 months ago by
Haiying.Kong90
Germany
Haiying.Kong90 wrote:

Probably this is too simple to make a package, I guess.

I probably just use bedtools to compare the regions I found and gene regions that I downloaded from UCSC.

ADD COMMENTlink written 5 months ago by Haiying.Kong90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 149 users visited in the last hour