Hello,
I have an inquiry on a paired-end RNA-seq experiment prepared from zebrafish livers. I would like to identify differentially expressed transcripts (or genes) between (n = 5) "control" and (n = 5) "lepa" samples.
I am looking for a method that will construct a matrix of counts from (n = 10) .bam files and their corresponding indices (.bai). The goal in doing this is to place the subsequent count matrix into the edgeR pipeline. I have tried to do some reading in various vignettes but it is not clear to me on how to accomplish this. The code I am using is appended below, but I am not sure on what to do next after opening the bam files and the zebrafish txdb object. I hope that my question is clear; if I failed to explain something please let me know and I can elaborate. Thanks for looking into this,
- Matt
library(GenomicAlignments)
library(TxDb.Drerio.UCSC.danRer10.refGene)
exbygene <- exonsBy(TxDb.Drerio.UCSC.danRer10.refGene, "gene")
lepa1<-BamFile("C:\\...\\lepa1.bam",index="C:\\...\\lepa1.bai",asMates=TRUE)
lepa2<-BamFile("C:\\...\\lepa2.bam",index="C:\\...\\lepa2.bai",asMates=TRUE)
lepa3<-BamFile("C:\\...\\lepa3.bam",index="C:\\...\\lepa3.bai",asMates=TRUE)
lepa4<-BamFile("C:\\...\\lepa4.bam",index="C:\\...\\lepa4.bai",asMates=TRUE)
lepa5<-BamFile("C:\\...\\lepa5.bam",index="C:\\...\\lepa5.bai",asMates=TRUE)
wt1<-BamFile("C:\\...\\\wt1.bam",index="C:\\...\\wt1.bai",asMates=TRUE)
wt2<-BamFile("C:\\...\\\wt2.bam",index="C:\\...\\wt2.bai",asMates=TRUE)
wt3<-BamFile("C:\\...\\\wt3.bam",index="C:\\...\\wt3.bai",asMates=TRUE)
wt4<-BamFile("C:\\...\\\wt4.bam",index="C:\\...\\wt4.bai",asMates=TRUE)
wt5<-BamFile("C:\\...\\\wt5.bam",index="C:\\...\\wt5.bai",asMates=TRUE)
open.BamFile(lepa1)
open.BamFile(lepa2)
open.BamFile(lepa3)
open.BamFile(lepa4)
open.BamFile(lepa5)
open.BamFile(wt1)
open.BamFile(wt2)
open.BamFile(wt3)
open.BamFile(wt4)
open.BamFile(wt5)
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicAlignments_1.12.0 Rsamtools_1.28.0
[3] Biostrings_2.44.0 XVector_0.16.0
[5] SummarizedExperiment_1.6.1 DelayedArray_0.2.2
[7] matrixStats_0.52.2 Biobase_2.36.2
[9] GenomicRanges_1.28.1 GenomeInfoDb_1.12.0
[11] IRanges_2.10.0 S4Vectors_0.14.0
[13] BiocGenerics_0.22.0 bamsignals_1.8.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.10 lattice_0.20-35 bitops_1.0-6
[4] grid_3.4.0 zlibbioc_1.22.0 Matrix_1.2-10
[7] BiocParallel_1.10.1 tools_3.4.0 RCurl_1.95-4.8
[10] compiler_3.4.0 GenomeInfoDbData_0.99.0