Correct script for featurecounts in Rsubread
2
0
Entering edit mode
rmoorehe • 0
@0f26ba36
Last seen 20 months ago
Canada

I am new to R and RStudio but have been trying to work through different examples using Rsubread for my data. I have tried reading vignettes and manuals prior to posting here but I am stuck and could really use some advice.

I have 7 paired-end, fastq files from Illumina sequencing of human samples. I have downloaded RStudio version 2022.12.0+353. I have installed the packages BiocManager and Rsubread using the packages function available in RStudio

I downloaded GRCh38.13.genome.fa.gz (release 43) from GENECODE and built the human genome reference into my working directory. Next I aligned my data one at a time and then placed my 7 aligned files in "BAM436files". I then ran feature counts on all files and placed them in "counts436". Everything seems to work well with most of the reads mapped during the alignment and featurecounts indicated 79-81% assigned alignments in all 7 samples.

The problem comes when I try to just look at the data. I tried to view the first 3 rows using commands like... this just lists all the data, not the first 3 rows. I can export the data to excel and the excel folder has 4 tabs; "counts", "annotation", "targets", "stat". The counts data has the names of my 7 samples across the top and then count data in each column...so I thought I was on the right track. So I guess my question is, why can't I limit the data display to the first "n" rows. Also, can I use this file for DESeq? I been trying to figure out how to generate files like coldata but thought I should make sure my count file would work first.

The codes I used are presented below.

>buildindex(basename = "human_index", reference = "GRCh38.p13.genome.fa.gz")

>align("human_index", readfile1 = "A436EV_1.fq.gz", readfile2 = "A436EV_2.fq.gz", output_file = "A436EV.aligned", output_format = ".", annot.inbuilt = "hg38", nthreads = 8)

>BAM436files <- c("A436EV.aligned", "C436EV.aligned", "E436EV.aligned", "A436200f.aligned", "B436200f.aligned", "C436200f.aligned", "D436200f.aligned")

>counts436 <- featurecounts(BAM436files, annotated.ext="directory for GRCh38.GTF", isGTFannotationFile=TRUE, nthreads=4, isPairedEnd=TRUE, countMulitMappingReads=FALSE, reportReads="CORE"

>head(counts436, 3)

>install.packages("openxlsx")
>library(openxlsx)
>write.xlsx(counts436, "destination folder"


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )
Rsubread Rsu • 1.1k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

Here's the relevant section from ?featureCounts

Value:

     A list with the following components:

  counts: a data matrix containing read counts for each feature or
          meta-feature for each library.

counts_junction (optional): a data frame including primary and
          secondary genes overlapping an exon junction, position
          information for the left splice site (`Site1') and the right
          splice site (`Site2') of a junction (including chromosome
          name, coordinate and strand) and number of supporting reads
          for each junction in each library. Both primary and secondary
          genes overlap at least one of the two splice sites of a <snip>

So if you want the head of the counts matrix, you have to select the correct list item.

ADD COMMENT
0
0
Entering edit mode

Thanks for the help. I was able to work through it using the bioconductor link. For some reason, I thought I had to create a new tab delimited table using excel or a text editor for coldata.

ADD REPLY

Login before adding your answer.

Traffic: 590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6