Dengue Fever is the most common mosquito-borne viral disease in the world. It is an illness caused by infection with a virus transmitted through the bite of the Aedes mosquito. Currently, there is no drug for dengue fever, and to prevent dengue fever, you must prevent the breeding of its carrier, the Aedes mosquitoes. Historically, there have been significant Dengue Fever outbreaks in Singapore. Presently, it is estimated that Dengue Cases may exceed 30,000 in 2016, and astonishingly, there are 5122 cases since March 1st just for this year alone.
Dengue fever is a vector-borne disease, spread by a vector (Aedes mosquito) through biting a host (infected human). (figure above) Typically when a female mosquito takes a blood meal from an infected person, it takes two weeks of the incubation period for the mosquito to be infectious to a healthy person.
Temporal Analysis: Analysis of historical weather data and dengue cases data shows that temperature is the most correlated feature compared to other features such as rainfall and wind speed. Rainfall and wind speed are more sporadic and volatile, making it hard to find a suggestive pattern. Indeed, the correlation matrix confirms the intuition drawn from the figures. Scatterplots of each of the features against the independent variable are contained in the appendix Figure III. Linear Regression confirms that temperature and rainfall are the only significant features with a p-value of 0.000 and 0.03, respectively, being associated with dengue fever positively. The R-squared value reflects the findings, with a low value of 8%. RF performs relatively better with an out of sample accuracy score of 34%, with the following parameters: 300 trees in the forest, a split of log! 𝑛 features, and entropy as inequality measure. Hence, using just meteorological data is not sufficient to accurately predict dengue fever.
Spatial Analysis: Mosquito habitat correlation with the dependent variable is suspected due to the fact that such data is collected by the same agency as dengue fever cases and it is highly likely that Singapore’s authorities identify mosquito habitat based on reported cases. Excluding this predictor, the AUC score was calculated to be 50%, which means that the algorithm is performing with the same accuracy as random guessing. However, pseudo-R-square is approximately 20%, which suggests some predictive power. In addition, all included features are statistically significant at an alpha level of 5%. The marginal effects, i.e the change in the probability of the dependent variable given changes in the independent ones, are the strongest in positive terms for transportation-related variables (street network and bus stops) as well as trash bins. Parks and the total population have a negative effect. This suggests that higher mobility contributes to higher dengue fever risks. The association of population is harder to reason because lot density and population do not fully capture population density and building density.