Question: Help: extract counts from RNA-seq .bam/.bai files
1
gravatar for mat149
2.1 years ago by
mat14940
mat14940 wrote:

Hello,

I have an inquiry on a paired-end RNA-seq experiment prepared from zebrafish livers.  I would like to identify differentially expressed transcripts (or genes) between (n = 5) "control" and (n = 5) "lepa" samples.

I am looking for a method that will construct a matrix of counts from (n = 10) .bam files and their corresponding indices (.bai).  The goal in doing this is to place the subsequent count matrix into the edgeR pipeline.  I have tried to do some reading in various vignettes but it is not clear to me on how to accomplish this.  The code I am using is appended below, but I am not sure on what to do next after opening the bam files and the zebrafish txdb object.  I hope that my question is clear; if I failed to explain something please let me know and I can elaborate.  Thanks for looking into this,

- Matt

library(GenomicAlignments)
library(TxDb.Drerio.UCSC.danRer10.refGene)

exbygene <- exonsBy(TxDb.Drerio.UCSC.danRer10.refGene, "gene")

lepa1<-BamFile("C:\\...\\lepa1.bam",index="C:\\...\\lepa1.bai",asMates=TRUE)
lepa2<-BamFile("C:\\...\\lepa2.bam",index="C:\\...\\lepa2.bai",asMates=TRUE)
lepa3<-BamFile("C:\\...\\lepa3.bam",index="C:\\...\\lepa3.bai",asMates=TRUE)
lepa4<-BamFile("C:\\...\\lepa4.bam",index="C:\\...\\lepa4.bai",asMates=TRUE)
lepa5<-BamFile("C:\\...\\lepa5.bam",index="C:\\...\\lepa5.bai",asMates=TRUE)
wt1<-BamFile("C:\\...\\\wt1.bam",index="C:\\...\\wt1.bai",asMates=TRUE)
wt2<-BamFile("C:\\...\\\wt2.bam",index="C:\\...\\wt2.bai",asMates=TRUE)
wt3<-BamFile("C:\\...\\\wt3.bam",index="C:\\...\\wt3.bai",asMates=TRUE)
wt4<-BamFile("C:\\...\\\wt4.bam",index="C:\\...\\wt4.bai",asMates=TRUE)
wt5<-BamFile("C:\\...\\\wt5.bam",index="C:\\...\\wt5.bai",asMates=TRUE)

open.BamFile(lepa1)
open.BamFile(lepa2)
open.BamFile(lepa3)
open.BamFile(lepa4)
open.BamFile(lepa5)
open.BamFile(wt1)
open.BamFile(wt2)
open.BamFile(wt3)
open.BamFile(wt4)
open.BamFile(wt5)

 

R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] GenomicAlignments_1.12.0   Rsamtools_1.28.0          
 [3] Biostrings_2.44.0          XVector_0.16.0            
 [5] SummarizedExperiment_1.6.1 DelayedArray_0.2.2        
 [7] matrixStats_0.52.2         Biobase_2.36.2            
 [9] GenomicRanges_1.28.1       GenomeInfoDb_1.12.0       
[11] IRanges_2.10.0             S4Vectors_0.14.0          
[13] BiocGenerics_0.22.0        bamsignals_1.8.0          

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10            lattice_0.20-35         bitops_1.0-6           
 [4] grid_3.4.0              zlibbioc_1.22.0         Matrix_1.2-10          
 [7] BiocParallel_1.10.1     tools_3.4.0             RCurl_1.95-4.8         
[10] compiler_3.4.0          GenomeInfoDbData_0.99.0

 

edger counts • 777 views
ADD COMMENTlink modified 2.1 years ago by Aaron Lun24k • written 2.1 years ago by mat14940
Answer: Help: extract counts from RNA-seq .bam/.bai files
2
gravatar for James W. MacDonald
2.1 years ago by
United States
James W. MacDonald50k wrote:

Usually I do something like

library(Rsamtools)
library(GenomicAlignments)
bams <- dir("../", "bam$", full.names = TRUE)
bfl <- BamFileList(bams, asMates = TRUE)
ex <- exonsBy(TxDb.Drerio.UCSC.danRer10.refGene, "gene")
ex <- reduce(ex)
SE <- summarizeOverlaps(ex, bfl, singleEnd = FALSE, param = ScanBamParam(flag = scanBamFlag(isDuplicate = FALSE)))

Then assays(SE)[[1]] contains your count matrix. Whether or not you should remove duplicates is up to you. There are arguments for and against and I tend to do so on an ad hoc basis.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by James W. MacDonald50k
Answer: Help: extract counts from RNA-seq .bam/.bai files
1
gravatar for Aaron Lun
2.1 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

I'll throw in a word for the featureCounts function in Rsubread.

https://www.bioconductor.org/help/workflows/RnaSeqGeneEdgeRQL/#quantifying-read-counts-for-each-gene

If you can get your TxDb object into a GTF file, it should be pretty easy to get it working.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Aaron Lun24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 273 users visited in the last hour