Â
Probing the Links Between Energy Sources, Pollution, and Socioeconomic Factors with Various Health Metrics
Project :
People are generally afraid of nuclear power because of the health concerns with radiation. But should they be?
Â
Does proximity to a power plant matter?
Can a country's life expectancy be predicted with other factors?
Targets:
Cancer incidence (from 2015 to 2019 from CDC and NIH)
Cardiovascular disease incidence (from 2017 to 2019 from CDC)
Respiratory disease incidence (from American Lung Association)
Target:
Life expectancy
Function to join tables
Modules used :
Boto3, to connect to S3 and store data in Python
Pandas, to transform data in Python
Haversine, to calculate distances between center coordinates of counties and coordinates of powerplants
SQLAlchemy, to make SQL joins, write table making schemas, and connect to PgAdmin database
 Afra's datasets, containing
Â
Initial plots to look for any potential correlations:
Pollution related features are important in predicting some cancer rates
Correlations between feature data and life expectancy                  Â
Visualizations:
Â
Â
Â
Â
| Training Data | Testing Data | 
|---|---|
| 0.97 | 0.82 | 
Â
Â
| Training Data | Testing Data | Accuracy Score | 
|---|---|---|
| 0.82 | 0.87 | 0.94 | 
Â
Â
Â