Question

estimateSizeFactors before or after removing non-coding genes and filtering for low counts

0

Entering edit mode

Tash. • 0

@tash-17343

Last seen 2.8 years ago

United Kingdom

Hi there,

My initial strategy was to retain the protein-coding genes and filter for low counts before creating the dds object.

        GD = read.delim("mart_export (1).txt", header = T, sep = "\t", stringsAsFactors = F)
        counts = counts %>% inner_join(GD[2]%>%unique, by=c("Gene" = "symbol"))
        counts_protein = counts%>% subset(rowSums(counts)>10)
        dds<-DESeqDataSetFromMatrix(counts_protein, colData, formula(~ sample_type))
        dds<-estimateSizeFactors(dds)

I'm not entirely sure if I should be estimating the size factors before or after removing non-coding genes and filtering for low counts? There are a lot of genes with a rowSums of 0 (majority of these are the non-coding) and should these be included in the estimation; i.e. I'm wondering if this can affect the estimation?

Many thanks.

DESeq2 RNASeq • 699 views

ADD COMMENT • link updated 2.8 years ago by Michael Love 41k • written 2.8 years ago by Tash. • 0

score 2 · Accepted Answer · 2021-06-30

2

Entering edit mode

Michael Love 41k

@mikelove

Last seen 10 hours ago

United States

You can filter first before size factors, but not that row sums of 0 are not included anyway in size factor estimation.

ADD COMMENT • link 2.8 years ago Michael Love 41k