Prince Ajudiya
5 min readOct 28, 2021

Practical-1 |Practical-2 | Practical-3 | Practical-4 | Practical-5 | Practical-6 | Practical-7 | Practical-8 | Practical-9 | Practical-10 | Practical-11| Practical-12 |

Practical:-12

AIM:-Seoul Bike Sharing Demand Prediction

Problem Description

Currently Rental bikes are introduced in many urban cities for the enhancement of mobility comfort. It is important to make the rental bike available and accessible to the public at the right time as it lessens the waiting time. Eventually, providing the city with a stable supply of rental bikes becomes a major concern. The crucial part is the prediction of bike count required at each hour for the stable supply of rental bikes.

Data Description

The dataset contains weather information (Temperature, Humidity, Windspeed, Visibility, Dewpoint, Solar radiation, Snowfall, Rainfall), the number of bikes rented per hour and date information.

Attribute Information:

  • Date : year-month-day
  • Rented Bike count — Count of bikes rented at each hour
  • Hour — Hour of he day
  • Temperature-Temperature in Celsius
  • Humidity — %
  • Windspeed — m/s
  • Visibility — 10m
  • Dew point temperature — Celsius
  • Solar radiation — MJ/m2
  • Rainfall — mm
  • Snowfall — cm
  • Seasons — Winter, Spring, Summer, Autumn
  • Holiday — Holiday/No holiday
  • Functional Day — NoFunc(Non Functional Hours), Fun(Functional hours)

So , we start with Exploratory Data Analysis(EDA)

Rented Bike Count per Year using barplot

The above plot shows that in year 2018 has higher Rented Bike demand

Rented Bike Count per Month using barplot

The above plot shows that in month may , june & july have higher Rented Bike demand.

Rented Bike Count per Day using barplot

The above plot shows that in day 6 , 7 & 9 have higher Rented Bike demand .

Rented Bike Count per Hour using barplot

The above plot shows that in hour 17 , 18 & 19 have higher Rented Bike demand.

Rented Bike Count per Seasons using barplot

The above plot shows that in Summer, Autumn and, Spring seasons have Higher Rented bike demand.

Rented Bike Count per Holiday using barplot

The above plot shows that in No holiday have Higher Rented bike demand.

Distribution plot of Rented Bike Count

The above graph shows that Rented Bike Count has moderate right skewness. Since the assumption of linear regression is that ‘the distribution of dependent variable has to be normal’, so we should perform some operation to make it normal.

Applying square root to Rented Bike Count to improve skewness

Since we have generic rule of applying Square root for the skewed variable in order to make it normal .After applying Square root to the skewed Rented Bike Count, here we get almost normal distribution.

Correlation graph

There is high correlation between Temperature and Dew_point_temperature.

Splitting data into train and test split

Linear Regression

Fitting model to linear regression
Scatterplot of linear regression

Polynomial Regression

Fitting model to Polynomial regression
Scatterplot of polynomial regression

Lasso Regression

Fit lasso regression
Scatterplot of lasso regression
Features importance
Features importance

Ridge Regression

Fit Ridge regression
Scatterplot of ridge regression

Decision Tree Regressor Algorithm

Fitting the values
Scatterplot of decision tree

Random Forest Regressor Algorithm

Fit the regressor with x and y data
Scatterplot of random forest regressor regression
Feature Importance
Feature Importance

XGboost Algorithm

Fit the XGboost with x and y data
Scatterplot of XGBoost

Conclusion

  • There is an high demand for Rented Bikes in the year 2018 .
  • There is an high demand for Rented Bikes in the month of May, June, and July .
  • There is an high demand for Rented Bikes in the day 6 , 7 and 9 .
  • There is an high demand for Rented Bikes in the hour 17 , 18 & 19 .
  • Summer season has highest Demand for Rented bikes and Winter has least Demand.
  • No Holiday has higher Demand compare to Holiday .
  • The features hour & temperature are the features that influence the most of the bike sharing count data
  • Linear Regression fit gives up to 49% of model score and about 48% of Dependent variable’s variance is explained by Independent variables, This seems to be improved.
  • Comparing to all algorithms XGboost algorithm has less Mean Squared error and Mean absolute error , and gives a model score up to 99% and R-Squared value is 98%, so it is concluded as optimal model.

GitHub Link

https://github.com/PrinceAjudiya/Bike-Sharing-Demand-Prediction

LinkedIn

https://www.linkedin.com/in/prince-ajudiya-40289819a/