Empowering the Automotive Industry with Data Analytics.¶

Use Data to Get to know about your customers and their needs.¶

The real benefit of data analytics is its ability to inform business decisions that fast track growth. After analyzing the data, you should be able to identify trends. These trends reveal actionable information that can be used to make changes to your website, app, product, etc. to optimize performance.

And we are happy to provide the accurate Data Analysis that will help you rule the Automotive Sector by understanding customers.

A powerful tool such as Data Analytics has the potential to take the automotive industry to new heights by helping them in making calculated decisions with the help of analytical proofs and research.¶

This is the dataset that I considered for performing the Data Analysis.

In [14]:
Dataset:
Out[14]:
Unnamed: 0 Make Model Variant Ex-Showroom_Price Displacement Cylinders Valves_Per_Cylinder Drivetrain Cylinder_Configuration Emission_Norm Engine_Location Fuel_System Fuel_Tank_Capacity Fuel_Type Height Length Width Body_Type Doors ... Rear_Center_Armrest iPod_Compatibility ESP_(Electronic_Stability_Program) Cooled_Glove_Box Recommended_Tyre_Pressure Heated_Seats Turbocharger ISOFIX_(Child-Seat_Mount) Rain_Sensing_Wipers Paddle_Shifters Leather_Wrapped_Steering Automatic_Headlamps Engine_Type ASR_/_Traction_Control Cruise_Control USB_Ports Heads-Up_Display Welcome_Lights Battery Electric_Range
535 535 Mahindra Bolero Power Plus Lx Rs. 7,49,192 1493 cc 4.0 2.0 RWD (Rear Wheel Drive) In-line BS IV Front, Transverse Injection 60 litres Diesel 1880 mm 3995 mm 1745 mm SUV 5.0 ... Yes NaN NaN NaN NaN NaN Yes NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
62 62 Maruti Suzuki Celerio X Vxi (O) Rs. 4,81,074 998 cc 3.0 4.0 FWD (Front Wheel Drive) In-line BS IV Front, Transverse Injection 35 litres Petrol 1560 mm 3600 mm 1600 mm Hatchback 5.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
709 709 Toyota Innova Crysta Touring Sport 2.7 Vx 7 Str Rs. 18,92,000 2393 cc 4.0 4.0 RWD (Rear Wheel Drive) In-line BS VI Front, Longitudinal Injection 55 litres Petrol 1795 mm 4735 mm 1830 mm MUV 5.0 ... Cup Holders Yes NaN Yes NaN NaN Yes Yes Yes NaN Yes Yes NaN NaN NaN NaN NaN NaN NaN NaN
732 732 Jeep Compass 2.0 Limited Plus 4X4 At Rs. 24,99,000 1956 cc 6.0 4.0 AWD (All Wheel Drive) V BS 6 Front, Longitudinal Injection 60 litres Diesel 1640 mm 4395 mm 1818 mm SUV 5.0 ... Cup Holders Yes Yes NaN NaN NaN Yes Yes Yes NaN Yes Yes NaN Yes NaN NaN NaN NaN NaN NaN
1160 1160 Porsche Macan S Rs. 85,03,000 2995 cc 6.0 4.0 AWD (All Wheel Drive) In-line BS IV Front, Longitudinal Injection 65 litres Petrol 1624 mm 4696 mm 1923 mm SUV 5.0 ... Cup Holders Yes Yes Yes NaN All Yes Yes Yes Yes Yes Yes NaN Yes Yes NaN NaN NaN NaN NaN
11 11 Datsun Redi-Go 1.0 S Amt Rs. 4,37,065 999 cc 3.0 4.0 FWD (Front Wheel Drive) In-line BS IV Front, Transverse Injection 28 litres Petrol 1541 mm 3429 mm 1560 mm Hatchback 5.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
585 585 Mahindra Xuv300 1.2 W4 Rs. 8,30,127 1197 cc NaN NaN RWD (Rear Wheel Drive) In-line BS 6 Front, Transverse Injection 42 litres Petrol 1617 mm 3995 mm 1821 mm SUV 5.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
978 978 Kia Seltos Htx Plus 1.5 Diesel Rs. 15,34,000 1493 cc NaN NaN FWD (Front Wheel Drive) In-line BS 6 Front, Longitudinal Injection 60 litres Diesel 1645 mm 4315 mm 1800 mm SUV 5.0 ... Cup Holders Yes NaN Yes NaN Driver Yes Yes Yes Yes Yes Yes NaN NaN Yes NaN NaN NaN NaN NaN

8 rows × 141 columns

In [15]:
The Dataset Statistics were as follows:
In [16]:
After this we can choose the desired features to perform Detailed Analysis.
In [20]:
First we Check the price distribution, we will use both normal and log scales due to the huge difference in prices.
Plot shows the price distribution of various cars.
In [21]:
Here I have plotted a box plot  so that we can clearly examine the variance that we observed in the prices.

Seems that there is a lot of outliers that form a very different type(s) of cars or to be more precise there are very different categories in the market.¶

In [23]:
Plot here shows the most commmon cars sorted as per thir body types.

Conclusion: SUV's Sedans and hatchbacks seems to be the dominating car types.¶

Conclusion: Indian market is favourable for SUVs, sedans, and Hatchback.¶

In [25]:
The box plot below shows that how price of the car varies with respect to its body type.

Conclusion: It's Clear from the plot above that Car body type strongly affect the price.¶

In [27]:
The plot below shiows the engine fuel type of the cars.
In [28]:
Here is a pie chart depiction of it.

Conclusion: Most cars seems to be run on gas or Diesel rather than other fuels which is not a good sign for the environment.¶

This data is going to change because electric vehicles have arrived in India.¶

Now let's see that what companies holds control over Indian market ( I am saying Indian because of the choice of our dataset and this can be applied to any dataset)

In [29]:
The plot below show the Top Car Making Companies in India.
As we have considered the Indian Cars Dataset here we are depicting that in a graph,
same can be done with other datasets as well
In [30]:
Distribution of cars by engine size.

Conclusion: Seems like most of cars have engine size in the 1000:2000cc range.(More frequently purchased car types have engine size in this range)¶

In [32]:
Next We checked the Horsepower of the cars.
In [33]:
Plot showing the relation horsepower and price considering different body types.

Conclusions: Horsepower of car seems to be highly related to car price.¶

But car body type seems a little bit blurry.¶

But hatchbacks seems to be the body type with the least horsepower and price.¶

In [34]:
Next we are looked into the relation between Mileage and price.
<Figure size 720x576 with 0 Axes>

Conclusion: looks like expensive cars tend to have worse mileage.¶

In [35]:
checking the overall correlation of between variables and each other.

First we make a pearson correlation grid.
In [36]:
Now, checking an extensive scatter plot grid of more numerical variable to investigate the realtion in more detail.

Seems there are a lot of multicollinearity between variables.¶

In [38]:
I plotted a 3D scatter plot to check for obvious clusters with main features as price horsepower and mileage.
In [39]:
Clustering the market needs a lot of effort as the separation of clusters is not that obvious.

But It's now clear that we have to look to many dimensions in order to cluster the market,as the more features we explore the harder it's to cluster the market.¶

These dimensions affect the decision of the buyers not to mention it also precvied totally different due to the very different mental models of buyers,in other words, price horsepower and mileage are not everything,some buyers would like to have a long wheel base car, some would like to have wider car all of the previous features, and more, strongly affect the buyer' decisions.¶

This means that two car can have very similar price and milage but one is a van with lots of space and the other is just a four doors sedan, these two cars are precieved as two different categories in the automotive industry so space "length width and height of the car" can also be a vital factor.¶

So, a three dimensional representation won't tell everythings,¶

so thats why we will try to consider clustering to use the very different features associated with each car.¶

Clustering:¶

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning.¶

Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances between cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem.¶

The type of clustering used here is k-means clustering.¶

Introduction to K-Means Algorithm:¶

The K-means clustering algorithm computes centroids and repeats until the optimal centroid is found. It is presumptively known how many clusters there are. It is also known as the flat clustering algorithm. The number of clusters found from data by the method is denoted by the letter ‘K’ in K-means.¶

In this method, data points are assigned to clusters in such a way that the sum of the squared distances between the data points and the centroid is as small as possible. It is essential to note that reduced diversity within clusters leads to more identical data points within the same cluster.¶

Here I am considering the example of corolla and clustering them in order to give a better competitor analysis. Similarly we can consider any other cluster group and analyze the market and models competitors in arather better way.

In [42]:
df = df[df.price < 60000]
In [43]:
num_cols = [ i for i in df.columns if df[i].dtype != 'object']
In [44]:
km = KMeans(n_clusters=8, n_init=20, max_iter=400, random_state=0)
clusters = km.fit_predict(df[num_cols])
df['cluster'] = clusters
df.cluster = (df.cluster + 1).astype('object')
df.sample(5)
Out[44]:
make model car variant body_type fuel_type fuel_system type drivetrain displacement cylinders mileage power torque fuel_tank height length width doors seats wheelbase airbags price cluster
1126 Hyundai Creta Hyundai Creta 1.6 Crdi Sx (O) SUV Diesel Injection Manual FWD (Front Wheel Drive) 1582 4 19.67 126.25 260 55.0 1630.0 4270.0 1780.0 5 5 2590.0 6 21609 8
1115 Skoda Rapid Skoda Rapid Onyx Mt Diesel Sedan Diesel Injection Manual FWD (Front Wheel Drive) 1498 4 21.13 108.50 250 55.0 1466.0 4413.0 1699.0 4 5 2552.0 2 16220 4
573 Hyundai Verna Hyundai Verna 1.6 Crdi Sx Sedan Diesel Injection Manual FWD (Front Wheel Drive) 1582 4 23.90 126.25 260 45.0 1445.0 4440.0 1729.0 4 5 2600.0 2 16415 4
190 Ford Aspire Ford Aspire 1.5 Tdci Titanium Plus Sedan Diesel Injection Manual FWD (Front Wheel Drive) 1498 4 26.10 98.63 215 40.0 1525.0 3995.0 1704.0 4 5 2490.0 6 12073 5
220 Toyota Glanza Toyota Glanza V Hatchback Petrol Injection Manual FWD (Front Wheel Drive) 1197 4 21.01 80.88 113 37.0 1540.0 3995.0 1745.0 5 5 2520.0 2 10614 1
In [45]:
Now we check some scatter plots but with adding clusters
In [46]:
price vs power

Conclusion: We can see the the clusters are strongly affected by the price with clear speration between clusters but it's kind of blurry when it comes to power.¶

In [47]:
power vs mileage

Conclusion: But yet we can see that clusters speration in power is stronger than mileage which almost have no separation of clusters>¶

In [48]:
Engine size vs Fuel tank

I made an interactive 3D scatter plot of price power, and mileage using also clusters.¶

In [49]:
3D plot:

checking the average prices of each cluster.

In [50]:
checking average prices of each cluster

Conclusion: As shown in the scatter plits earlier there is a clear seperation of clusters when it comes to prices.¶

checking that how many cars exist in each cluster.

In [51]:
Number of cars in each cluster.

Conclusion: We can generally say that even if clusters generated are not determinant yet we can see that they still can be useful.¶

Finding the potential strategic group¶

first we find the cluster of the Toyota Corolla (and its variants)

We found that the cluster of the corolla is cluster 1 and also cluster 5 we can now search these clusters and check what is intersing about it.

First we check a sample of these clusters

Here is a more interactive chart that shows cars prices including all variants (with maximum and minimum value of each car).

In [55]:
A more interactive chart that shows cars prices including all variants (with max and min value of each car).
In [56]:
Count of each body type in the targeted cluster
(here we have taken the corolla cluster for performing our analysis).

seems like there are too many SUV's in the Toyota Clusters, should that be important?

With clustering there are too many variable taken in considration which are hard to be traced by normal methods. The clusters generated by the KMeans model can be used to identify what is the strategic group that form a strong competition to the company products in the market it also show the close clusters to this group which also can be put in considration in some cases.¶

Problem with clustering:¶

As exciting and tempting as it's to use clustering to produce strategic groups it worth mentioning that the clustering process itself is a little bit ambigous and features contribution to the clustering process can't be easily explained so the overall interpretability of the model forms a challenge.¶

Conclusion:¶

clustering may be not determinant but it can be used to augment the management decision by using it side by side with human intuition to come out with the right strategic group.¶

Conclusion from our Data Analyis:¶

1.There are a lot of outliers that form very different types (s) of cars or to be more precise there are very different categories in the market.¶

2.SUVs, Sedans, and hatchbacks seem to be the dominating car types.¶

3.The Indian market is favorable for SUVs, sedans, and Hatchback.¶

4.It's clear from the plot above that Car body type strongly affects the price.¶

5.Most cars seem to be run on gas or Diesel rather than other fuels which is not a good sign for the environment.This data is going to change because electric vehicles have arrived in India.¶

6.Seems like most cars have the engine sizes in the 1000:2000cc range.¶

7.The horsepower of the car seems to be highly related to car price. But hatchbacks seem to be the body type with the least horsepower and price.¶

8.Looks like expensive cars tend to have worse mileage.¶

9.But It's now clear that we have to look to many dimensions in order to cluster the market, as the more features we explore the harder it's to cluster the market.¶

10.These dimensions affect the decision of the buyers not to mention it is also perceived totally different due to the very different mental models of buyers, in other words, price horsepower and mileage are not everything, some buyers would like to have a long wheelbase car, some would like to have a wider car all of the previous features, and more, strongly affect the buyer' decisions.This means that two-car can have very similar prices and milage but one is a van with lots of space and the other is just a four doors sedan, these two cars are perceived as two different categories in the automotive industry so space "length width, and height of the car" can also be a vital factor.¶