Search
Question: creating design and count matrix for rna-seq differential expression
0
gravatar for marongiu.luigi
6 weeks ago by
European Union
marongiu.luigi0 wrote:

Hello

 

I am trying to understand how to run a differential expression using R and for that I am referring to Smyth and other's ["limma: linear models for microarray and RNA-seq data user guide"][1] (2016). This vignette refers to data sets provided by [WEHI bioinfo][2]. The latter provides the link to a dataset made of four libraries (A_1, A_2, B_1 and B_2), a chromosome reference fasta file and a Targets.txt file in the form:

 

    CellType InputFile OutputFile

    A        A_1.txt.gz A_1.bam

    A        A_2.txt.gz A_2.bam

    B        B_1.txt.gz B_1.bam

    B        B_2.txt.gz B_2.bam

 

The case study prepares a design object using a model.matrix function based on the CellType field of the Targets.txt file, then creates an index reference and performs alignment, summarization, filtering, normalization and finally linear model fitting using the following commands:

 

    library(limma)

    library(edgeR)

    library(Rsubread)

    targets <- readTargets(file="Targets.txt") 

    celltype <- factor(targets$CellType)

    design <- model.matrix(~celltype)

    buildindex(basename = "chr1", reference = "hg19_chr1.fa") 

    align(index = "chr1", readfile1 = targets$InputFile, input_format = "gzFASTQ", output_format = "BAM", output_file = targets$OutputFile, unique = TRUE, indels = 5)

    fc <- featureCounts(files = targets$OutputFile, annot.inbuilt = "hg19") 

    x <- DGEList(counts = fc$counts, genes = fc$annotation[,c("GeneID", "Length")])

    isexpr <- rowSums(cpm(x) > 10) >= 2

    x <- x[isexpr,]

    y <- voom(x, design, plot = TRUE) 

    fit <- eBayes(lmFit(y, design))

 

This all works fine -- although it is unfeasible to attach the actual data herein -- and I wanted to extend this example to the Smith's example (page 69). Here the first step is to create a matrix of read counts; I believe this should come from the featureCounts function thus it is represented by the fc object. The second step is to remove the low counts and to apply scale normalization. I therefore used:

 

    dge <- DGEList(counts = fc$counts, genes = fc$annotation[,c("GeneID", "Length")])

    isexpr <- rowSums(cpm(dge) > 10) >= 2

    dge <- dge[isexpr,]

    dge <- calcNormFactors(dge)

    logCPM <- cpm(dge, log = TRUE, prior.count = 3)

    fit <- eBayes(logCPM, design)

but I get an error in the final step:

 

    > fit <- eBayes(logCPM, design)

    Error: $ operator is invalid for atomic vectors

 

The same is true if I take off the annotation feature of dge with

 

    dge <- DGEList(counts = fc$counts)

 

I imagine the problem is due to the design object, so my question is how can I handle the model.matrix function so that I can apply it to different cases?

Thank you.

PS

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
 [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rsubread_1.26.1 edgeR_3.18.1    limma_3.32.8   

loaded via a namespace (and not attached):
[1] compiler_3.4.0  tools_3.4.0     grid_3.4.0      locfit_1.5-9.1  lattice_0.20-35

  [1]: https://bioconductor.riken.jp/packages/3.3/bioc/vignettes/limma/inst/doc/usersguide.pdf

  [2]: http://bioinf.wehi.edu.au/RNAseqCaseStudy/

ADD COMMENTlink modified 6 weeks ago by Aaron Lun17k • written 6 weeks ago by marongiu.luigi0
0
gravatar for Aaron Lun
6 weeks ago by
Aaron Lun17k
Cambridge, United Kingdom
Aaron Lun17k wrote:

You need to fit a linear model via lmFit first.

ADD COMMENTlink written 6 weeks ago by Aaron Lun17k

ops, i did miss that line. Running:

logCPM <- cpm(dge, log = TRUE, prior.count = 3)
fit <- lmFit(logCPM, design)
fit <- eBayes(fit, trend = TRUE)

as in the vignette goes smoothly. my bad.

ADD REPLYlink written 6 weeks ago by marongiu.luigi0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 299 users visited in the last hour