Monthly Archives: September 2019

Illness Caused by Drinking Water in NYC from 2010 to 2019

Waterborne illnesses are caused by drinking contaminated or dirty water that has been tainted with disease-causing bacteria or pathogens and account for approximately 3.4 million deaths each year worldwide. These types of illnesses are most common in developing nations, also known as “third-world” countries, as these nations lack adequate water filtration systems that are necessary to provide safe and clean water to its inhabitants.

The United States, as a developed nation or “first-world” country, has a low rate of waterborne illness due to drinking contaminated drinking water since we have adequate filtration systems in place, but there are still problems regarding water quality as most famously demonstrated in Flint, Michigan.

Using the 311 data, I decided to investigate the incidences of waterborne illness in New York City and investigate water quality in New York City to determine if there was a relationship between the two. I was interested in investigating the relationship between incidences of waterborne illnesses and water quality in New York City because I wanted to determine if there was a relationship between the two or if they are unrelated and the incidences of waterborne illnesses were isolated incidents caused by other factors.

The information garnered from these visualizations and this investigation overall could be used to help city and government agencies, the Department of Health and Mental Hygiene and the Department of Environmental Protection specifically, to determine which areas are in need of water quality improvement and to determine if there any factors which could contribute to the causes of waterborne illness in the boroughs or areas where the rate of waterborne illnesses are high or clustered and take measures to prevent outbreaks.

To start off my investigation, I created a visualization that shows where the incidences of waterborne illnesses are on a map and are separated by year to show how these incidences change by location and if there are any clustered incidences.

I then created several visualizations to show the relationship between waterborne illness and water quality. The first two are pie charts that show the percentage of complaints each borough has made for waterborne illnesses and water quality respectively. This visualization was chosen because it was the easiest to see from which borough the most incidences and complaints are coming from. They do not seem to support the hypothesis that waterborne illness and water quality are correlated since Brooklyn has the most incidences of waterborne illness and Queens has the most water quality complaints.

The second two visualizations are line charts that show the amount of complaints per year for waterborne illnesses and water quality respectively. This visualization type was chosen because is was the easiest to see trends over time. These graphs also do not seem to support the hypothesis that waterborne illness and water quality are correlated. Waterborne illnesses decrease from 2010 to 2011 then increase until 2016 where it then decreases to present, while water quality complaints are stable from 2010 to 2015 then increases until 2018 then decreases until present.

The last visualization is a scatter plot showing the relationship between waterborne illness and water quality. To make this graph I had to transform the data by including the count of waterborne illnesses and the count of water quality complaints separated by year and by borough so that there were a total of 50 data points that could be used instead of 10 if it was by year or 5 if it was by borough. After plotting the 50 data points a linear trend line was added to determine if there was a relationship between waterborne illness and water quality complaints. The relationship was found to be Waterborne Illness = 17.3682 + 0.00235605*Water Quality with a p-value of 0.148401, meaning this relationship is only significant at the 15% level, and an R-squared value of 0.043014 meaning only 4.3014% of the variance of waterborne illness is explained by water quality. These measures show that this is not a very significant relationship and was further proven when I calculated the correlation coefficient between waterborne illness and water quality, which was found to be 0.207 and is indicative of a weak positive relationship.

It is important to note, there are no entity or time fixed effects in the regression analysis above. This could have affected the coefficient on Water Quality as well as affected the p-values and standard errors.

In the future, I would like to expand the time range of the data set to determine if there is a stronger relationship between waterborne illness and water quality with more data points. I would have also liked demographic information to determine if race plays a part in the instances of waterborne illnesses since some races are naturally resistant to some illnesses and some are more sensitive to them.