CardioGood Fitness Case Study - Descriptive Statistics¶

The market research team at AdRight is assigned the task to identify the profile of the typical customer for each treadmill product offered by CardioGood Fitness. The market research team decides to investigate whether there are differences across the product lines with respect to customer characteristics. The team decides to collect data on individuals who purchased a treadmill at a CardioGood Fitness retail store during the prior three months. The data are stored in the CardioGoodFitness.csv file.

The team identifies the following customer variables to study:¶

  • product purchased, TM195, TM498, or TM798.
  • gender.
  • age, in years.
  • education, in years.
  • relationship status, single or partnered.
  • annual household income.
  • average number of times the customer plans to use the treadmill each week.
  • average number of miles the customer expects to walk/run each week.
  • and self-rated fitness on a 1-to-5 scale, where 1 is poor shape and 5 is excellent shape.

Perform descriptive analytics to create a customer profile for each CardioGood Fitness treadmill product line.¶

In [6]:
# Load the necessary packages
import numpy as np
import pandas as pd
In [7]:
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
In [9]:
# Load the Cardio Dataset
mydata = pd.read_csv('/content/drive/MyDrive/MIT - Data Sciences/Colab Notebooks/Week_One_-_Python_for_Data_Science/CardioGood_Analysis/CardioGoodFitness.csv')
In [10]:
mydata.head()
Out[10]:
Product Age Gender Education MaritalStatus Usage Fitness Income Miles
0 TM195 18 Male 14 Single 3 4 29562 112
1 TM195 19 Male 15 Single 2 3 31836 75
2 TM195 19 Female 14 Partnered 4 3 30699 66
3 TM195 19 Male 12 Single 3 3 32973 85
4 TM195 20 Male 13 Partnered 4 2 35247 47
In [11]:
mydata.describe(include="all")
Out[11]:
Product Age Gender Education MaritalStatus Usage Fitness Income Miles
count 180 180.000000 180 180.000000 180 180.000000 180.000000 180.000000 180.000000
unique 3 NaN 2 NaN 2 NaN NaN NaN NaN
top TM195 NaN Male NaN Partnered NaN NaN NaN NaN
freq 80 NaN 104 NaN 107 NaN NaN NaN NaN
mean NaN 28.788889 NaN 15.572222 NaN 3.455556 3.311111 53719.577778 103.194444
std NaN 6.943498 NaN 1.617055 NaN 1.084797 0.958869 16506.684226 51.863605
min NaN 18.000000 NaN 12.000000 NaN 2.000000 1.000000 29562.000000 21.000000
25% NaN 24.000000 NaN 14.000000 NaN 3.000000 3.000000 44058.750000 66.000000
50% NaN 26.000000 NaN 16.000000 NaN 3.000000 3.000000 50596.500000 94.000000
75% NaN 33.000000 NaN 16.000000 NaN 4.000000 4.000000 58668.000000 114.750000
max NaN 50.000000 NaN 21.000000 NaN 7.000000 5.000000 104581.000000 360.000000
In [13]:
mydata.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180 entries, 0 to 179
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Product        180 non-null    object
 1   Age            180 non-null    int64 
 2   Gender         180 non-null    object
 3   Education      180 non-null    int64 
 4   MaritalStatus  180 non-null    object
 5   Usage          180 non-null    int64 
 6   Fitness        180 non-null    int64 
 7   Income         180 non-null    int64 
 8   Miles          180 non-null    int64 
dtypes: int64(6), object(3)
memory usage: 12.8+ KB
In [14]:
import matplotlib.pyplot as plt
%matplotlib inline

mydata.hist(figsize=(20,30))
Out[14]:
array([[<Axes: title={'center': 'Age'}>,
        <Axes: title={'center': 'Education'}>],
       [<Axes: title={'center': 'Usage'}>,
        <Axes: title={'center': 'Fitness'}>],
       [<Axes: title={'center': 'Income'}>,
        <Axes: title={'center': 'Miles'}>]], dtype=object)
No description has been provided for this image
In [15]:
import seaborn as sns #importing seaborn library

sns.boxplot(x="Gender", y="Age", data=mydata)
Out[15]:
<Axes: xlabel='Gender', ylabel='Age'>
No description has been provided for this image
In [16]:
sns.boxplot(x="Product", y="Age", data=mydata)
Out[16]:
<Axes: xlabel='Product', ylabel='Age'>
No description has been provided for this image
In [17]:
pd.crosstab(mydata['Product'],mydata['Gender'] )
Out[17]:
Gender Female Male
Product
TM195 40 40
TM498 29 31
TM798 7 33
In [18]:
pd.crosstab(mydata['Product'],mydata['MaritalStatus'] )
Out[18]:
MaritalStatus Partnered Single
Product
TM195 48 32
TM498 36 24
TM798 23 17
In [19]:
sns.countplot(x="Product", hue="Gender", data=mydata)
Out[19]:
<Axes: xlabel='Product', ylabel='count'>
No description has been provided for this image
In [20]:
pd.pivot_table(mydata, index=['Product', 'Gender'],
                     columns=[ 'MaritalStatus'], aggfunc=len)
Out[20]:
Age Education Fitness Income Miles Usage
MaritalStatus Partnered Single Partnered Single Partnered Single Partnered Single Partnered Single Partnered Single
Product Gender
TM195 Female 27 13 27 13 27 13 27 13 27 13 27 13
Male 21 19 21 19 21 19 21 19 21 19 21 19
TM498 Female 15 14 15 14 15 14 15 14 15 14 15 14
Male 21 10 21 10 21 10 21 10 21 10 21 10
TM798 Female 4 3 4 3 4 3 4 3 4 3 4 3
Male 19 14 19 14 19 14 19 14 19 14 19 14
In [21]:
pd.pivot_table(mydata,'Income', index=['Product', 'Gender'],
                     columns=[ 'MaritalStatus'])
Out[21]:
MaritalStatus Partnered Single
Product Gender
TM195 Female 46153.777778 45742.384615
Male 50028.000000 43265.842105
TM498 Female 49724.800000 48920.357143
Male 49378.285714 47071.800000
TM798 Female 84972.250000 58516.000000
Male 81431.368421 68216.428571
In [22]:
pd.pivot_table(mydata,'Miles', index=['Product', 'Gender'],
                     columns=[ 'MaritalStatus'])
Out[22]:
MaritalStatus Partnered Single
Product Gender
TM195 Female 74.925926 78.846154
Male 80.190476 99.526316
TM498 Female 94.000000 80.214286
Male 87.238095 91.100000
TM798 Female 215.000000 133.333333
Male 176.315789 147.571429
In [23]:
sns.pairplot(mydata)
Out[23]:
<seaborn.axisgrid.PairGrid at 0x797581eaef90>
No description has been provided for this image
In [24]:
mydata['Age'].std()
Out[24]:
6.943498135399795
In [25]:
mydata['Age'].mean()
Out[25]:
28.788888888888888
In [26]:
sns.displot(data=mydata, x='Age', kde=True)
Out[26]:
<seaborn.axisgrid.FacetGrid at 0x797581000b10>
No description has been provided for this image
In [27]:
mydata.hist(by='Gender',column = 'Age')
Out[27]:
array([<Axes: title={'center': 'Female'}>,
       <Axes: title={'center': 'Male'}>], dtype=object)
No description has been provided for this image
In [28]:
mydata.hist(by='Gender',column = 'Income')
Out[28]:
array([<Axes: title={'center': 'Female'}>,
       <Axes: title={'center': 'Male'}>], dtype=object)
No description has been provided for this image
In [29]:
mydata.hist(by='Gender',column = 'Miles')
Out[29]:
array([<Axes: title={'center': 'Female'}>,
       <Axes: title={'center': 'Male'}>], dtype=object)
No description has been provided for this image
In [30]:
mydata.hist(by='Product',column = 'Miles', figsize=(20,30))
Out[30]:
array([[<Axes: title={'center': 'TM195'}>,
        <Axes: title={'center': 'TM498'}>],
       [<Axes: title={'center': 'TM798'}>, <Axes: >]], dtype=object)
No description has been provided for this image
In [32]:
# Select only the numerical columns before calculating the correlation.
numerical_data = mydata.select_dtypes(include=['number'])
corr = numerical_data.corr()
corr
Out[32]:
Age Education Usage Fitness Income Miles
Age 1.000000 0.280496 0.015064 0.061105 0.513414 0.036618
Education 0.280496 1.000000 0.395155 0.410581 0.625827 0.307284
Usage 0.015064 0.395155 1.000000 0.668606 0.519537 0.759130
Fitness 0.061105 0.410581 0.668606 1.000000 0.535005 0.785702
Income 0.513414 0.625827 0.519537 0.535005 1.000000 0.543473
Miles 0.036618 0.307284 0.759130 0.785702 0.543473 1.000000
In [33]:
sns.heatmap(corr, annot=True)
Out[33]:
<Axes: >
No description has been provided for this image
In [34]:
# Simple Linear Regression

#Load function from Scikit-learn
from Scikit-learn import linear_model

# Create linear regression object
regr = linear_model.LinearRegression()

y = mydata['Miles']
x = mydata[['Usage','Fitness']]

# Train the model using the training sets
regr.fit(x,y)
Out[34]:
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
In [39]:
print(f"Coefficients: {regr.coef_}")
Coefficients: [20.21486334 27.20649954]
In [40]:
print(f"Intercept: {regr.intercept_}")
Intercept: -56.74288178464862
In [42]:
#MilesPredicted = -56.74 + 20.21*Usage + 27.20*Fitness
In [38]:
# Convert notebook to html
!jupyter nbconvert --to html "/content/drive/MyDrive/MIT - Data Sciences/Colab Notebooks/Week_One_-_Python_for_Data_Science/CardioGood_Analysis/Notebook+-+CardioGood+Fitness+Data+Analysis.ipynb"
[NbConvertApp] Converting notebook /content/drive/MyDrive/MIT - Data Sciences/Colab Notebooks/Week_One_-_Python_for_Data_Science/CardioGood_Analysis/Notebook+-+CardioGood+Fitness+Data+Analysis.ipynb to html
[NbConvertApp] WARNING | Alternative text is missing on 10 image(s).
[NbConvertApp] Writing 891271 bytes to /content/drive/MyDrive/MIT - Data Sciences/Colab Notebooks/Week_One_-_Python_for_Data_Science/CardioGood_Analysis/Notebook+-+CardioGood+Fitness+Data+Analysis.html