Question: imputing data using pcaMethods with large amounts of missing data in rows
0
gravatar for eli-sava
2.3 years ago by
eli-sava0
eli-sava0 wrote:

Hello. I am starting to use pcamethods for an environmental application involving concentrations. The columns of my dataset represent stations, which I expect to have spatial structure. The rows represent hourly data from 2005-2017. The trick is that the monitoring network expanded greatly over the last decade. So the last rows representing 2013-2017 contain data for every column with perhaps 10% missing data, which I believe is within the range usually considered high for imputation, but not laughable. The initial rows, on the other hand, contain a much higher fraction of missing data. At best 40% of the stations go back to the first year (2005) I am considering. Others columns would be missing entirely until the corresponding station came on line.

Can anyone suggest a good way to proceed in this case? Should I develop the components based on 2013-2017? How do I best use pcamethods to impute the early part of the dataset, while avoiding its use for the components which I understand is way beyond its warranty. I am willing to assume the spatial structure has been stationary and that 2013-2017 samples the patterns of interest. Thanks.

pcamethods missing data • 314 views
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by eli-sava0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 173 users visited in the last hour