Practical Exam

Prince Ajudiya
3 min readNov 18, 2021

Dataset: https://archive.ics.uci.edu/ml/machine-learning-databases/00475/

Task — 1

Dataset Description using Orange tool.
What is need to be done to improve the accuracy of classification result of the given dataset? Get the maximum classification accuracy possible by performing following methods.
→Pre-processing
o Encoding
o Normalization
o Missing value handling
o Feature Selection

Task-2:
Generate the Dashboard of preprocessed dataset from task-1.
Find the Maximum data insights by plotting Bar chart, Boxplot, Pie Plot, Stack Plot using PowerBI dashboard visualization.

  1. Provide a screen shot of data description and explain in brief.

The goal of the research is to help the auditors by building a classification model that can predict the fraudulent firm on the basis the present and historical risk factors. The information about the sectors and the counts of firms are listed respectively as Irrigation (114), Public Health (77), Buildings and Roads (82), Forest (70), Corporate (47), Animal Husbandry (95), Communication (1), Electrical (4), Land (5), Science and Technology (3), Tourism (1), Fisheries (41), Industries (37), Agriculture (200).

Many risk factors are examined from various areas like past records of audit office, audit-paras, environmental conditions reports, firm reputation summary, on-going issues report, profit-value records, loss-value records, follow-up reports etc. After in-depth interview with the auditors, important risk factors are evaluated and their probability of existence is calculated from the present and past records.

Audit — risk Dataset

2. Provide screen shot(s) of data pre-processing steps showing its significance.

Add dataset to orange tool
Dataset before Data pre-processing
Dataset Information
Normalization
After normalize dataset
Architecture

3. Provide a screen shot showing accuracy before and after pre-processing.

After pre-processing

Test & Score
KNN Confusion Matrix
Tree Confusion Matrix
Logistic Regression Confusion Matrix
Neural Network Confusion Matrix

Before pre-processing

Test & Score
KNN Confusion Matrix
Tree Confusion Matrix
Logistic Regression Confusion Matrix
Neural Network Confusion Matrix

4. Provide a screen shot of PowerBI dashboard with description.

Load dataset
Stacked bar chart
Pie chart
Stacked column chart
Clustered bar chart
Web view Dashboard
Mobile View Dashboard

--

--