Question

Correct script for featurecounts in Rsubread

0

Entering edit mode

rmoorehe • 0

@0f26ba36

Last seen 2.2 years ago

Canada

I am new to R and RStudio but have been trying to work through different examples using Rsubread for my data. I have tried reading vignettes and manuals prior to posting here but I am stuck and could really use some advice.

I have 7 paired-end, fastq files from Illumina sequencing of human samples. I have downloaded RStudio version 2022.12.0+353. I have installed the packages BiocManager and Rsubread using the packages function available in RStudio

I downloaded GRCh38.13.genome.fa.gz (release 43) from GENECODE and built the human genome reference into my working directory. Next I aligned my data one at a time and then placed my 7 aligned files in "BAM436files". I then ran feature counts on all files and placed them in "counts436". Everything seems to work well with most of the reads mapped during the alignment and featurecounts indicated 79-81% assigned alignments in all 7 samples.

The problem comes when I try to just look at the data. I tried to view the first 3 rows using commands like... this just lists all the data, not the first 3 rows. I can export the data to excel and the excel folder has 4 tabs; "counts", "annotation", "targets", "stat". The counts data has the names of my 7 samples across the top and then count data in each column...so I thought I was on the right track. So I guess my question is, why can't I limit the data display to the first "n" rows. Also, can I use this file for DESeq? I been trying to figure out how to generate files like coldata but thought I should make sure my count file would work first.

The codes I used are presented below.

>buildindex(basename = "human_index", reference = "GRCh38.p13.genome.fa.gz")

>align("human_index", readfile1 = "A436EV_1.fq.gz", readfile2 = "A436EV_2.fq.gz", output_file = "A436EV.aligned", output_format = ".", annot.inbuilt = "hg38", nthreads = 8)

>BAM436files <- c("A436EV.aligned", "C436EV.aligned", "E436EV.aligned", "A436200f.aligned", "B436200f.aligned", "C436200f.aligned", "D436200f.aligned")

>counts436 <- featurecounts(BAM436files, annotated.ext="directory for GRCh38.GTF", isGTFannotationFile=TRUE, nthreads=4, isPairedEnd=TRUE, countMulitMappingReads=FALSE, reportReads="CORE"

>head(counts436, 3)

>install.packages("openxlsx")
>library(openxlsx)
>write.xlsx(counts436, "destination folder"


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )

Rsubread Rsu • 1.3k views

ADD COMMENT • link 2.3 years ago rmoorehe • 0

score 1 · Answer 1 · 2023-03-22

Here's the relevant section from ?featureCounts

Value:

     A list with the following components:

  counts: a data matrix containing read counts for each feature or
          meta-feature for each library.

counts_junction (optional): a data frame including primary and
          secondary genes overlapping an exon junction, position
          information for the left splice site (`Site1') and the right
          splice site (`Site2') of a junction (including chromosome
          name, coordinate and strand) and number of supporting reads
          for each junction in each library. Both primary and secondary
          genes overlap at least one of the two splice sites of a <snip>

So if you want the head of the counts matrix, you have to select the correct list item.

score 0 · Answer 2 · 2023-03-22

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 14 hours ago

WEHI, Melbourne, Australia

See

ADD COMMENT • link 2.3 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks for the help. I was able to work through it using the bioconductor link. For some reason, I thought I had to create a new tab delimited table using excel or a text editor for coldata.

ADD REPLY • link 2.3 years ago rmoorehe • 0