paste your elevator pitch hereA SHORT (4-5 SENTENCES) PARAGRAPH THAT DESCRIBES KEY INSIGHTS TAKEN FROM METRICS IN THE PROJECT RESULTS THINK TOP OR MOST IMPORTANT RESULTS.
Read and format project data
# Include and execute your code hereurl ="https://raw.githubusercontent.com/fivethirtyeight/data/master/star-wars-survey/StarWars.csv"df = pd.read_csv(url, encoding='ISO-8859-1') # or try 'latin1'# print(df.head())
Highlight the Questions and Tasks
Client Request
The Client is who performed the survey but outsourced the analitics to a 3rd party. They want you to clean up the data so you can: a. Validate the data provided on GitHub lines up with the article by recreating 2 of the visuals from the article a. Determine if you predict if a person from the survey makes more than $50k
The Cleaning Process being performed in order to make the dataset usable.
Read and format data
# Include and execute your code here# Remove the Second Row as it is unnecessary df = df.iloc[1:, :]#Rename the columnsdf.columns = ['RespondentID', 'seen_any', 'sw_fan','seen_first','seen_second','seen_third','seen_fourth','seen_fifth','seen_sixth','rank_1','rank_2','rank_3','rank_4','rank_5','rank_6','Han_Solo','Luke_Skywalker','Leia_Organa','Anakin_Skywalker','Obi_Wan_Kenobi','Emperor_Palpatine','Darth_Vader','Lando_Calrissian','Boba_Fett','C-3P0','R2_D2','Jar_Jar_Binks','Padme_Amidala','Yoda','shot_first','knows_expand_universe','fan_expand_universe','star_trek_fan','gender','age','income','education','region' ]# Define Boolean Conversion Functiondef to_bool(cell):returnint(bool(str( cell ).strip() ) )# Columns to Convertbool_columns = ['sw_fan','star_trek_fan','seen_first','seen_second','seen_third','seen_fourth','seen_fifth','seen_sixth','knows_expand_universe','fan_expand_universe']# Convert Columns for col in bool_columns: df[col +'_bool'] = df[col].apply(to_bool)# Create print(df.head())
RespondentID seen_any sw_fan seen_first \
1 3.292880e+09 Yes Yes Star Wars: Episode I The Phantom Menace
2 3.292880e+09 No NaN NaN
3 3.292765e+09 Yes No Star Wars: Episode I The Phantom Menace
4 3.292763e+09 Yes Yes Star Wars: Episode I The Phantom Menace
5 3.292731e+09 Yes Yes Star Wars: Episode I The Phantom Menace
seen_second \
1 Star Wars: Episode II Attack of the Clones
2 NaN
3 Star Wars: Episode II Attack of the Clones
4 Star Wars: Episode II Attack of the Clones
5 Star Wars: Episode II Attack of the Clones
seen_third \
1 Star Wars: Episode III Revenge of the Sith
2 NaN
3 Star Wars: Episode III Revenge of the Sith
4 Star Wars: Episode III Revenge of the Sith
5 Star Wars: Episode III Revenge of the Sith
seen_fourth \
1 Star Wars: Episode IV A New Hope
2 NaN
3 NaN
4 Star Wars: Episode IV A New Hope
5 Star Wars: Episode IV A New Hope
seen_fifth \
1 Star Wars: Episode V The Empire Strikes Back
2 NaN
3 NaN
4 Star Wars: Episode V The Empire Strikes Back
5 Star Wars: Episode V The Empire Strikes Back
seen_sixth rank_1 ... sw_fan_bool \
1 Star Wars: Episode VI Return of the Jedi 3 ... 1
2 NaN NaN ... 1
3 NaN 1 ... 1
4 Star Wars: Episode VI Return of the Jedi 5 ... 1
5 Star Wars: Episode VI Return of the Jedi 5 ... 1
star_trek_fan_bool seen_first_bool seen_second_bool seen_third_bool \
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
4 1 1 1 1
5 1 1 1 1
seen_fourth_bool seen_fifth_bool seen_sixth_bool knows_expand_universe_bool \
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
4 1 1 1 1
5 1 1 1 1
fan_expand_universe_bool
1 1
2 1
3 1
4 1
5 1
[5 rows x 48 columns]
include figures in chunks and discuss your findings in the figure.