Question: Help: extract counts from RNA-seq .bam/.bai files
1
gravatar for mat149
23 months ago by
mat14940
mat14940 wrote:

Hello,

I have an inquiry on a paired-end RNA-seq experiment prepared from zebrafish livers.  I would like to identify differentially expressed transcripts (or genes) between (n = 5) "control" and (n = 5) "lepa" samples.

I am looking for a method that will construct a matrix of counts from (n = 10) .bam files and their corresponding indices (.bai).  The goal in doing this is to place the subsequent count matrix into the edgeR pipeline.  I have tried to do some reading in various vignettes but it is not clear to me on how to accomplish this.  The code I am using is appended below, but I am not sure on what to do next after opening the bam files and the zebrafish txdb object.  I hope that my question is clear; if I failed to explain something please let me know and I can elaborate.  Thanks for looking into this,

- Matt

library(GenomicAlignments)
library(TxDb.Drerio.UCSC.danRer10.refGene)

exbygene <- exonsBy(TxDb.Drerio.UCSC.danRer10.refGene, "gene")

lepa1<-BamFile("C:\\...\\lepa1.bam",index="C:\\...\\lepa1.bai",asMates=TRUE)
lepa2<-BamFile("C:\\...\\lepa2.bam",index="C:\\...\\lepa2.bai",asMates=TRUE)
lepa3<-BamFile("C:\\...\\lepa3.bam",index="C:\\...\\lepa3.bai",asMates=TRUE)
lepa4<-BamFile("C:\\...\\lepa4.bam",index="C:\\...\\lepa4.bai",asMates=TRUE)
lepa5<-BamFile("C:\\...\\lepa5.bam",index="C:\\...\\lepa5.bai",asMates=TRUE)
wt1<-BamFile("C:\\...\\\wt1.bam",index="C:\\...\\wt1.bai",asMates=TRUE)
wt2<-BamFile("C:\\...\\\wt2.bam",index="C:\\...\\wt2.bai",asMates=TRUE)
wt3<-BamFile("C:\\...\\\wt3.bam",index="C:\\...\\wt3.bai",asMates=TRUE)
wt4<-BamFile("C:\\...\\\wt4.bam",index="C:\\...\\wt4.bai",asMates=TRUE)
wt5<-BamFile("C:\\...\\\wt5.bam",index="C:\\...\\wt5.bai",asMates=TRUE)

open.BamFile(lepa1)
open.BamFile(lepa2)
open.BamFile(lepa3)
open.BamFile(lepa4)
open.BamFile(lepa5)
open.BamFile(wt1)
open.BamFile(wt2)
open.BamFile(wt3)
open.BamFile(wt4)
open.BamFile(wt5)

 

R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] GenomicAlignments_1.12.0   Rsamtools_1.28.0          
 [3] Biostrings_2.44.0          XVector_0.16.0            
 [5] SummarizedExperiment_1.6.1 DelayedArray_0.2.2        
 [7] matrixStats_0.52.2         Biobase_2.36.2            
 [9] GenomicRanges_1.28.1       GenomeInfoDb_1.12.0       
[11] IRanges_2.10.0             S4Vectors_0.14.0          
[13] BiocGenerics_0.22.0        bamsignals_1.8.0          

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10            lattice_0.20-35         bitops_1.0-6           
 [4] grid_3.4.0              zlibbioc_1.22.0         Matrix_1.2-10          
 [7] BiocParallel_1.10.1     tools_3.4.0             RCurl_1.95-4.8         
[10] compiler_3.4.0          GenomeInfoDbData_0.99.0

 

edger counts • 721 views
ADD COMMENTlink modified 23 months ago by Aaron Lun23k • written 23 months ago by mat14940
Answer: Help: extract counts from RNA-seq .bam/.bai files
2
gravatar for James W. MacDonald
23 months ago by
United States
James W. MacDonald49k wrote:

Usually I do something like

library(Rsamtools)
library(GenomicAlignments)
bams <- dir("../", "bam$", full.names = TRUE)
bfl <- BamFileList(bams, asMates = TRUE)
ex <- exonsBy(TxDb.Drerio.UCSC.danRer10.refGene, "gene")
ex <- reduce(ex)
SE <- summarizeOverlaps(ex, bfl, singleEnd = FALSE, param = ScanBamParam(flag = scanBamFlag(isDuplicate = FALSE)))

Then assays(SE)[[1]] contains your count matrix. Whether or not you should remove duplicates is up to you. There are arguments for and against and I tend to do so on an ad hoc basis.

ADD COMMENTlink modified 23 months ago • written 23 months ago by James W. MacDonald49k
Answer: Help: extract counts from RNA-seq .bam/.bai files
1
gravatar for Aaron Lun
23 months ago by
Aaron Lun23k
Cambridge, United Kingdom
Aaron Lun23k wrote:

I'll throw in a word for the featureCounts function in Rsubread.

https://www.bioconductor.org/help/workflows/RnaSeqGeneEdgeRQL/#quantifying-read-counts-for-each-gene

If you can get your TxDb object into a GTF file, it should be pretty easy to get it working.

ADD COMMENTlink modified 23 months ago • written 23 months ago by Aaron Lun23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 154 users visited in the last hour