Question

Where to find the supplementary files for the case studies in edgeR User Guide

0

Entering edit mode

Ali Sajid Imami • 0

@alisajidimami-23305

Last seen 15 months ago

United States

Hi,

I am trying to replicate the case studies at the end of the edgeR user guide to verify a workflow I’ve developed. Particularly I am interested in the Pasilla knockdown study to test doing complete work within R.

However the thing in the user guide uses a few files that have the data on individual runs and samples, like targets.txt. I can’t find them. Can someone help me find those?

edger • 326 views

ADD COMMENT • link updated 4.0 years ago by Kevin Blighe ★ 3.9k • written 4.0 years ago by Ali Sajid Imami • 0

score 0 · Answer 1 · 2020-04-25

Hey,

The Pasilla data comes as a data package in Bioconductor, called pasilla. The expression data and metadata come 'shipped' / 'bundled' with the package via the 'extdata' directory. You should be able to obtain them like this:

1, install and load package

if (!requireNamespace('BiocManager', quietly = TRUE))
    install.packages('BiocManager')

BiocManager::install('pasilla')

library('pasilla')

2, retrieve and prepare the data

pasCts <- system.file('extdata', 'pasilla_gene_counts.tsv',
  package='pasilla', mustWork=TRUE)
pasAnno <- system.file('extdata', 'pasilla_sample_annotation.csv',
  package='pasilla', mustWork=TRUE)

cts <- as.matrix(read.csv(pasCts,sep='\t',row.names='gene_id'))

coldata <- read.csv(pasAnno, row.names=1)
coldata <- coldata[,c('condition','type')]
rownames(coldata) <- sub('fb', '', rownames(coldata))
cts <- cts[, rownames(coldata)]



    coldata
           condition  type
treated1     treated  single-read
treated2     treated  paired-end
treated3     treated  paired-end
untreated1 untreated  single-read
untreated2 untreated  single-read
untreated3 untreated  paired-end
untreated4 untreated  paired-end

     cts[1:5,1:5]
            treated1 treated2 treated3 untreated1 untreated2
FBgn0000003        0        0        1          0          0
FBgn0000008      140       88       70         92        161
FBgn0000014        4        0        0          5          1
FBgn0000015        1        0        0          0          2
FBgn0000017     6205     3072     3334       4664       8714

You can then process that in EdgeR or DESeq2.

-------------

The limma / EdgeR authors list other supplementary data here:

http://bioinf.wehi.edu.au/resources/webReferences.html

If, however, you specifically mean the targets.txt file from the f1000 workflow, From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline, then it's stated that this file is not available publicly:

Except for the targets file targets.txt, all data analyzed in the workflow is read automatically from public websites

You can easily create it though.

Kevin