Ben Clark's

Website

Coding tools for researching a new car

When purchasing something I want to me certain that I am making the best choice; usually, this involves a lot of research. I recently purchased a new car and to be certain I would make the right decision, I spent too many hours researching my options. In this post will layout some python code I used to aid my research. While this post is about cars, this code and idea can be used applied to other purchases.

My goal of this research was to find the best deal. For me, the value of the car was more important than the quality of the car. So, I considered cars that were far less than my budget limit, incase they were an exceptional deals.

For your own search, you will need a good car search engine to find all cars within a radius from your location. I knew I wanted a certified pre-owned, so I used the car manufacturer’s website. I recorded the price, year, and trim (you may want more) of every certified pre-owned [Make] [Model] within 160 mile radius of my home in a csv file contained 27 vehicles and a the heading: Price,Mileage,Trim. Using the python code below, I graphed price vs. mileage and found that there was a linear relationship between price and mileage; so, I created a best fit line.

Subaru Foresters 2016-18' price vs mileage within 160 miles of [location]
*Your price data will show up as numbers.
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

# import csv file as a pandas dataframe
df = pd.read_csv('car_data.csv')
# separate into different dataframes based on trim (touring and
# limited in this case)
touring_df = df[df['Trim'] == 'touring']
limited_df = df[df['Trim'] == 'limited']

# label the graph
plt.xlabel('Price $')
plt.ylabel('Mileage')
plt.title("[Make] [Model] [Year range] within 160 miles of [location]")

# FOR LIMITED TRIM
# linear regression best fit line
slope_limited, intercept_limited, _, _, _ = stats.linregress(limited_df['Price'],
                                                             limited_df['Mileage'])
plt.plot(limited_df['Price'], limited_df['Mileage'], '+')   # plot data
plt.plot(limited_df['Price'], intercept_limited + slope_limited*limited_df['Price'],
         ':', color='blue', label='Limited')    # plot best fit line


# FOR TOURING TRIM
slope_touring, intercept_touring, _, _, _ = stats.linregress(touring_df['Price'],
                                                             touring_df['Mileage'])
plt.plot(touring_df['Price'], touring_df['Mileage'], '+')
plt.plot(touring_df['Price'], intercept_touring + slope_touring*touring_df['Price'],
         ':', color='orange', label='Touring')

# ADD CUSTOM LEGEND
touring_patch = mpatches.Patch(color='orange', label='Touring')
limited_patch = mpatches.Patch(color='blue', label='Limited')
plt.legend(handles=[touring_patch, limited_patch])

plt.savefig('limited_and_touring.png', dpi=500)

                

In this graph, the best fit line is the "market price" and with this graph I can find which cars are the farthest below the market price and therefore the best value.