Entering edit mode
Wu, Huiyun
▴
70
@wu-huiyun-5271
Last seen 10.6 years ago
Dear Steve,
thank you for your feedback. Below is what it returned.
> st <- sum(counts==0)
> st
[1] 12296
Let me give more details of my situation. Say, I have read in 6 BAM
files (s1 to s6). and then got a count table (filename = countTable)
using countOverlaps function with resultant file head as below.
Question here, is the rowname (first col) miRNA ID?
> colnames(countTable) <- samples
> head(countTable)
s1 s2 s3 s4 s5 s6
1 1 1 3 0 2 2
10 0 0 1 0 1 0
100 0 1 0 1 3 3
1000 0 7 1 1 3 3
10000 0 6 2 8 16 18
100008586 0 0 0 0 0 0
> dim(countTable)
[1] 22932 6
I then filtered out those miRNAs whose expressions were 0 across all
samples by the following code (it might be what you asked for).
> newCountTable <- countTable[rowSums(countTable) != 0, ]
> dim(newCountTable)
[1] 22105 6
> head(newCountTable)
s1 s2 s3 s4 s5 s6
1 1 1 3 0 2 2
10 0 0 1 0 1 0
100 0 1 0 1 3 3
1000 0 7 1 1 3 3
10000 0 6 2 8 16 18
100009676 0 0 0 1 3 4
After annotating using the following library, I got another count
table.
entrezGenes <- rownames(newCountTable)
mapped_genes <- mappedkeys(org.Hs.egSYMBOL)
overlapgenes <- intersect(entrezGenes, mapped_genes)
overlapsymbols <- as.list(org.Hs.egSYMBOL[overlapgenes])
getGeneSymbol <- function(x) {
if (x %in% overlapgenes) {
return(overlapsymbols[[which(x ==
names(overlapsymbols))]])
}
else {
return(NA)
}
}
geneSymbols <- sapply(entrezGenes, getGeneSymbol)
countTable <- cbind(geneSymbols,
numberOfexons[names(numberOfexons) %in% entrezGenes],
geneLength[names(geneLength) %in% entrezGenes],
newCountTable)
> dim(countTable)
[1] 22105 9
I further removed redundant IDs and NAs to generate an updated count
table (dataset named 'd') for DE analysis
> dim(d)
[1] 22100 7
> d[20:30,]
X s1 s2 s3 s4 s5 s6
20 7-Mar 1 2 2 11 12 5
21 7-Sep 2 6 4 6 19 9
22 8-Mar 1 2 0 1 6 4
23 8-Sep 0 4 4 4 3 10
24 9-Mar 1 3 5 1 6 9
25 9-Sep 2 5 8 10 35 20
26 A1BG 1 1 3 0 2 2
27 A1BG-AS1 0 1 1 0 0 4
28 A1CF 0 2 1 3 9 1
29 A2LD1 1 1 2 0 1 6
30 A2M 5 53 48 70 58 114
It looks like there are 25 technical controls, but majority were
annotated.
thanks,
---
William Wu
Vanderbilt University/Department of Biostatistics.
-----Original Message-----
From: Steve Lianoglou [mailto:mailinglist.honeypot@gmail.com]
Sent: Monday, May 07, 2012 1:59 PM
To: Wu, Huiyun
Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org>
Subject: Re: [BioC] miRNA annotation
Hi,
On Mon, May 7, 2012 at 2:15 PM, Wu, Huiyun
<william.wu@vanderbilt.edu<mailto:william.wu@vanderbilt.edu>> wrote:
> Hello there,
>
> I am new in seq data analysis, and have difficulty in miRNA
annotation. To clarify my question, I have the following codes.
>
>
> bamfile <- "GSE33858_1.bowtie_alignment.bam"
> aln <- readBamGappedAlignments(bamfile)
>
txdb<-makeTranscriptDbFromUCSC(genome="hg19",tablename="knownGene")
> exonRangesList <- exonsBy(txdb, by = "gene")
> exonRanges <- unlist(exonRangesList)
> strand(exonRanges) <- "*"
> geneNames <- sub("\..*$<file: \..*$="">", "", names(exonRanges))
> exonRangesListNoStrand <- split(exonRanges, geneNames)
> numberOfexons <- sapply(width(exonRangesListNoStrand), length)
> geneLength <- sum(width(reduce(exonRangesListNoStrand)))
> # Counting reads for the BAM file
> counts <- countOverlaps(exonRangesListNoStrand, aln)
> names(counts) <- names(exonRangesListNoStrand)
>
> > length(counts)
>
> [1] 22932
>
> So my question is from aligned file to annotated file. Specifically,
why I got 22932 features after mapping to hg19? I learned there are
only less than 2000 miRNAs matched genes. I also know there are
several RNA databases including miRBase for annotation. Did I do
something logically wrong? It would be greatly appreciated if someone
could show me how to implement this annotation in R.
What does `sum(counts == 0)` give you?
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University Contact Info:
http://cbio.mskcc.org/~lianos/contact
[[alternative HTML version deleted]]