Question

where to download practice data

0

Entering edit mode

jinliqin • 0

@e4bf94be

Last seen 2.9 years ago

United States

I have installed R, RStudio, and DEseq2 package, and tried to practice with some data as cited in "Analyzing RNA-seq data with DESeq2" "http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html"

Is there a location to download data, such as "cts" "coldata" for practice purpose?

Lee

DESeq2 • 758 views

ADD COMMENT • link updated 2.9 years ago by ATpoint ★ 4.0k • written 2.9 years ago by jinliqin • 0

score 0 · Answer 1 · 2021-06-02

You should be able to get example datasets following the DESeq2 tutorial ->

 http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

Count matrix input

Alternatively, the function DESeqDataSetFromMatrix can be used if you already have a matrix of read counts prepared from another source. Another method for quickly producing count matrices from alignment files is the featureCounts function (Liao, Smyth, and Shi 2013) in the Rsubread package. To use DESeqDataSetFromMatrix, the user should provide the counts matrix, the information about the samples (the columns of the count matrix) as a DataFrame or data.frame, and the design formula. To demonstate the use of DESeqDataSetFromMatrix, we will read in count data from the pasilla package. We read in a count matrix, which we will name cts, and the sample information table, which we will name coldata. Further below we describe how to extract these objects from, e.g. featureCounts output.

library("pasilla")
pasCts <- system.file("extdata",
                      "pasilla_gene_counts.tsv",
                      package="pasilla", mustWork=TRUE)
pasAnno <- system.file("extdata",
                       "pasilla_sample_annotation.csv",
                       package="pasilla", mustWork=TRUE)
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))
coldata <- read.csv(pasAnno, row.names=1)
coldata <- coldata[,c("condition","type")]
coldata$condition <- factor(coldata$condition)
coldata$type <- factor(coldata$type)

score 0 · Answer 2 · 2021-06-04

There are several Bioc packages that contain suitable data, for example the airway dataset.

https://bioconductor.org/packages/release/data/experiment/html/airway.html

library("airway")
data(airway)
airway

This returns a RangedSummarizedExperiment from which you can get raw counts via assay() and per-sample group information via colData().