3 min readSep 9, 2021

Practical:-5

AIM:- Visual Programming with Orange Tool

This blog is all about how to split data into training and testing using the Orange tool. We will also learn more about Test & Score Widget. We will also explore the cross-validation method using the Orange tool.

Introduction to Orange tool please check my previous blog for that . Click here .

Train-Test Split

The train-test split is a technique for evaluating the performance of a machine learning algorithm. It can be used for classification or regression problems and can be used for any supervised learning algorithm. The procedure involves taking a dataset and dividing it into two subsets .

For the Train Test Split, I used the below workflow.

Here I am using previous blog California Housing Prices dataset .

After that, I pass the whole dataset into Data Sampler Widget. In Data Sampler Widget we will partition our dataset into train and test data .

As you can see I split the data into 80:20 ratio . 80% Train Data and 20% Test Data. On the bottom, you can see 492data points use for Training and 122data points used for testing from a Total of 614 data points .

Now after split the data I connect Data Sampler with Test & Score Widget. I connect two lines one for train data and another for test data .

Now for model creation we use Random Forest algorithm , K-Nearest Neighbor algorithm and Tree algorithm .

Firstly we evaluate our model on test on test data .

After we evaluate our models on test on train data .

Cross Validation

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model . The basic form of cross-validation is k-fold cross-validation .

We can do cross-validation using the Test & Score widget . Cross validation is apply on entire dataset .

For Cross-Validation I used the following workflow:

In above image we can see that we are using number of folds = 5 . That means 4 folds use as train and one fold use as test . Above process randomly work five times . The result is average of all the five results .

Confusion Matrix

A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known.

After that Test & Score widget connects with the Confusion Matrix in which we can see the result and after that, from the confusion matrix, we can select the data and view it into the Data Table widget .