Predicting Air Quality in Dar es Salaam, Africa
Data was collected from querying a MongoDB database of one of Africa’s largest open data platforms openAfrica and built a timeseries model to predict PM 2.5 readings.
- Created a wrangle function that will extract the PM2.5 readings from the site that has the most total readings in the Dar es Salaam collection, localize reading time stamps to the timezone for “Africa/Dar_es_Salaam”, remove all outlier PM2.5 readings that are above 100.
- Resample the data to provide the mean PM2.5 reading for each hour.
- Impute any missing values using the forward-will method.
- Created time series plot, rolling average plot of the readings
- Created ACF, PACF plot for the data.
- Built an autoregression model.
- Improved AR model by tuning its hyperparameters.