Practical-1 |Practical-2 | Practical-3 | Practical-4 | Practical-5 | Practical-6 | Practical-7 | Practical-8 | Practical-9 | Practical-10 | Practical-11| Practical-12 |
Practical:-12
AIM:-Seoul Bike Sharing Demand Prediction
Problem Description
Currently Rental bikes are introduced in many urban cities for the enhancement of mobility comfort. It is important to make the rental bike available and accessible to the public at the right time as it lessens the waiting time. Eventually, providing the city with a stable supply of rental bikes becomes a major concern. The crucial part is the prediction of bike count required at each hour for the stable supply of rental bikes.
Data Description
The dataset contains weather information (Temperature, Humidity, Windspeed, Visibility, Dewpoint, Solar radiation, Snowfall, Rainfall), the number of bikes rented per hour and date information.
Attribute Information:
- Date : year-month-day
- Rented Bike count — Count of bikes rented at each hour
- Hour — Hour of he day
- Temperature-Temperature in Celsius
- Humidity — %
- Windspeed — m/s
- Visibility — 10m
- Dew point temperature — Celsius
- Solar radiation — MJ/m2
- Rainfall — mm
- Snowfall — cm
- Seasons — Winter, Spring, Summer, Autumn
- Holiday — Holiday/No holiday
- Functional Day — NoFunc(Non Functional Hours), Fun(Functional hours)
So , we start with Exploratory Data Analysis(EDA)
The above plot shows that in year 2018 has higher Rented Bike demand
The above plot shows that in month may , june & july have higher Rented Bike demand.
The above plot shows that in day 6 , 7 & 9 have higher Rented Bike demand .
The above plot shows that in hour 17 , 18 & 19 have higher Rented Bike demand.
The above plot shows that in Summer, Autumn and, Spring seasons have Higher Rented bike demand.
The above plot shows that in No holiday have Higher Rented bike demand.
The above graph shows that Rented Bike Count has moderate right skewness. Since the assumption of linear regression is that ‘the distribution of dependent variable has to be normal’, so we should perform some operation to make it normal.
Since we have generic rule of applying Square root for the skewed variable in order to make it normal .After applying Square root to the skewed Rented Bike Count, here we get almost normal distribution.
There is high correlation between Temperature and Dew_point_temperature.
Linear Regression
Polynomial Regression
Lasso Regression
Ridge Regression
Decision Tree Regressor Algorithm
Random Forest Regressor Algorithm
XGboost Algorithm
Conclusion
- There is an high demand for Rented Bikes in the year 2018 .
- There is an high demand for Rented Bikes in the month of May, June, and July .
- There is an high demand for Rented Bikes in the day 6 , 7 and 9 .
- There is an high demand for Rented Bikes in the hour 17 , 18 & 19 .
- Summer season has highest Demand for Rented bikes and Winter has least Demand.
- No Holiday has higher Demand compare to Holiday .
- The features hour & temperature are the features that influence the most of the bike sharing count data
- Linear Regression fit gives up to 49% of model score and about 48% of Dependent variable’s variance is explained by Independent variables, This seems to be improved.
- Comparing to all algorithms XGboost algorithm has less Mean Squared error and Mean absolute error , and gives a model score up to 99% and R-Squared value is 98%, so it is concluded as optimal model.
GitHub Link
https://github.com/PrinceAjudiya/Bike-Sharing-Demand-Prediction