imputing data using pcaMethods with large amounts of missing data in rows
Entering edit mode
eli-sava • 0
Last seen 5.5 years ago

Hello. I am starting to use pcamethods for an environmental application involving concentrations. The columns of my dataset represent stations, which I expect to have spatial structure. The rows represent hourly data from 2005-2017. The trick is that the monitoring network expanded greatly over the last decade. So the last rows representing 2013-2017 contain data for every column with perhaps 10% missing data, which I believe is within the range usually considered high for imputation, but not laughable. The initial rows, on the other hand, contain a much higher fraction of missing data. At best 40% of the stations go back to the first year (2005) I am considering. Others columns would be missing entirely until the corresponding station came on line.

Can anyone suggest a good way to proceed in this case? Should I develop the components based on 2013-2017? How do I best use pcamethods to impute the early part of the dataset, while avoiding its use for the components which I understand is way beyond its warranty. I am willing to assume the spatial structure has been stationary and that 2013-2017 samples the patterns of interest. Thanks.

pcamethods missing data • 574 views

Login before adding your answer.

Traffic: 470 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6