imputing data using pcaMethods with large amounts of missing data in rows
0
0
Entering edit mode
eli-sava • 0
@eli-sava-12767
Last seen 7.0 years ago

Hello. I am starting to use pcamethods for an environmental application involving concentrations. The columns of my dataset represent stations, which I expect to have spatial structure. The rows represent hourly data from 2005-2017. The trick is that the monitoring network expanded greatly over the last decade. So the last rows representing 2013-2017 contain data for every column with perhaps 10% missing data, which I believe is within the range usually considered high for imputation, but not laughable. The initial rows, on the other hand, contain a much higher fraction of missing data. At best 40% of the stations go back to the first year (2005) I am considering. Others columns would be missing entirely until the corresponding station came on line.

Can anyone suggest a good way to proceed in this case? Should I develop the components based on 2013-2017? How do I best use pcamethods to impute the early part of the dataset, while avoiding its use for the components which I understand is way beyond its warranty. I am willing to assume the spatial structure has been stationary and that 2013-2017 samples the patterns of interest. Thanks.

pcamethods missing data • 817 views
ADD COMMENT

Login before adding your answer.

Traffic: 860 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6