edgeR filtering: how to remove meta tags and lowly expressed reads (cpm)
1
0
Entering edit mode
lynski008 • 0
@lynski008-11394
Last seen 8.3 years ago

I'm attempting to use edgeR to analyse my RNA-seq data.  I have managed to use readDGE to create a data object which I have called DG and it looks like this if I print it: 

An object of class DGEList
$samples
                      files group lib.size norm.factors
counts3D7_1-1 counts3D7_1-1     1 13251744            1
counts3D7_2-1 counts3D7_2-1     1 13809955            1
counts3D7_3-1 counts3D7_3-1     1 12328705            1
counts3D7_1-2 counts3D7_1-2     1 12605616            1
counts3D7_2-2 counts3D7_2-2     1 13392599            1
22 more rows ...

$counts
                     Samples
Tags                  counts3D7_1-1 counts3D7_2-1 counts3D7_3-1 counts3D7_1-2 counts3D7_2-2 counts3D7_3-2 counts3D7_1-3 counts3D7_2-3 counts3D7_3-3 countsBLM_1-1 countsBLM_2-1 countsBLM_3-1 countsBLM_1-2 countsBLM_2-2 countsBLM_3-2 countsBLM_1-3
  rna_PF3D7_0100100-1            49            24            27            14             8             6            15             9            15            11             5             8             4             5             6             6
  rna_PF3D7_0100200-1            17            17            23            11            13            13             3             6             6            31            15            15             9             9             4             2
  rna_PF3D7_0100300-1            15            10             4             2             4             5             2             4             5             5            11             4             1             2             2             1
  rna_PF3D7_0100400-1            44            45            46            28            33            35            38            33            53           116            87            82            65            66            41            88
  rna_PF3D7_0100500-1             0             0             0             0             1             2             1             1             0             0             2             2             1             1             0             5
                    

I now wish to move onto the filtering stage but I am stuck.  There are two things I wish to filter out.  When I created the DG data object a warning came up that said "Meta tags detected: __no_feature, __ambiguous, __too_low_aQual, __not_aligned, __alignment_not_unique" so I wish to remove expression values with these meta tags i.e. non-aligned features.  Secondly, in accordance with edgeR's recommendations I wish to remove features which do not have at least 1 read per million in n samples, where n for my dataset =3.  Can anyone help me with putting together the code to achieve this?  I've read through the edgeR manual but am new to R and edgeR and am not really sure where to start.  Any help is greatly appreciated.  Thanks,

edger filtering cpm • 2.3k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen just now
WEHI, Melbourne, Australia

To remove the meta tags:

MetaTags <- grep("^__", rownames(y))
y <- y[-MetaTags, ]

Here I have assumed your DGEList object is called y.

To filter on expression:

IsExpr <- rowSums(cpm(y) > 1) >= 3
y <- y[IsExpr, ]

For more detail about filtering, see the work flow paper: http://f1000research.com/articles/5-1438

ADD COMMENT
0
Entering edit mode

That worked perfectly.  Thank you very much Gordon.

ADD REPLY

Login before adding your answer.

Traffic: 320 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6