Getting Introns Expression at a Per Gene Level

0

Entering edit mode

Fong Chun Chan ▴ 320

@fong-chun-chan-5706

Last seen 9.7 years ago

Hi, I am using the R package, Genomic Features, to get Intron Expression and the function that I am using is the intronsByTranscript(). While this function is useful, it returns the number of raw reads that align to each intron grouped at the transcript level. Is there an easy way to get it to group it by a gene such that I am grabbing all the introns that fall in all the transcripts belong to a gene (and without double counting the intronic space). Similar, is there a way to easily get the raw read count at an individual intron level rather than having it grouped? The introns() function seems to be defunct now as it recommends that we use the intronsByTranscript() function. Thanks, Fong [[alternative HTML version deleted]]

• 777 views

ADD COMMENT • link updated 11.3 years ago by Valerie Obenchain ★ 6.8k • written 11.3 years ago by Fong Chun Chan ▴ 320

0

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 2.3 years ago

United States

Hi Fong, We don't have a single function call to get introns by gene but you can put one together in a few steps. library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene Specifying use.names = TRUE puts the transcript names on the GRangesList. introns <- intronsByTranscript(txdb, use.names=TRUE) Unlist and remove duplicate introns. ulst <- unlist(introns) intronsNoDups <- ulst[!duplicated(ulst)] The select() function can map txid to geneid (and others). See ?select. txnames <- names(intronsNoDups) map <- select(txdb, keys=txnames, keytype='TXNAME', cols='GENEID') Not all transcripts are associated with a gene id and some are associated with more than one. > head(map, 10) TXNAME GENEID 1 uc001aaa.3 <na> 2 uc001aaa.3 <na> 3 uc010nxq.1 <na> 4 uc010nxq.1 <na> 5 uc010nxr.1 <na> 6 uc010nxr.1 <na> 7 uc009vjk.2 100133331 8 uc009vjk.2 100133331 9 uc001aau.3 100132287 10 uc021oeh.1 100133331 To get a GRangesList of introns by gene, split the introns on the associated gene id. idx <- map$GENEID[!is.na(map$GENEID)] intronsByGene <- split(intronsNoDups[!is.na(map$GENEID)], idx) names(intronsByGene) <- unique(idx) The list names are geneid and the element rownames are txid. > intronsByGene GRangesList of length 19784: $100133331 GRanges with 13 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> uc002qsd.4 chr19 [58858396, 58858718] - uc002qsd.4 chr19 [58859007, 58861735] - uc002qsd.4 chr19 [58862018, 58862756] - ... $100132287 GRanges with 1 range and 0 metadata columns: seqnames ranges strand uc003wyw.1 chr8 [18248856, 18257507] + To count reads against this annotation you can use summarizeOverlaps(). The 'reads' can be Bam files, GappedAlignments or GappedAlignmentPairs. The 'features' argument will be the GRangesList of introns by gene. The result will be counts per gene. To count just introns with no grouping you can still use summarizeOverlaps() but use the 'intronsNoDups' GRanges object as 'features'. Valerie On 01/29/2013 11:38 AM, Fong Chun Chan wrote: > Hi, > > I am using the R package, Genomic Features, to get Intron Expression and > the function that I am using is the intronsByTranscript(). While this > function is useful, it returns the number of raw reads that align to each > intron grouped at the transcript level. Is there an easy way to get it to > group it by a gene such that I am grabbing all the introns that fall in all > the transcripts belong to a gene (and without double counting the intronic > space). > > Similar, is there a way to easily get the raw read count at an individual > intron level rather than having it grouped? The introns() function seems to > be defunct now as it recommends that we use the intronsByTranscript() > function. > > Thanks, > > Fong > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.3 years ago Valerie Obenchain ★ 6.8k

Login before adding your answer.