Skip to content

WajidAyub/Data-Classification-Using-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Classification Pipeline Using AI

Project Overview

This repository features a robust, fundamental machine learning pipeline designed to teach machines how to recognize patterns in tabular data and categorize new information based on supervised learning algorithms.

To ensure optimal predictive performance, this project implements a dynamic multi-algorithm comparison architecture, evaluating several classification models side-by-side to find the most accurate solution for the dataset.

Core Features

  • Data Pipeline: Seamlessly loads and processes standard tabular datasets (Iris dataset from scikit-learn).
  • Data Segregation: Automatically implements train-test splits (80% training set and 20% testing set) to prevent data leakage and ensure fair evaluation.
  • Algorithm Comparison Engine: Trains and evaluates three distinct machine learning algorithms simultaneously:
    • Random Forest Classifier
    • Logistic Regression
    • Support Vector Machine (SVM)
  • Automated Evaluation: Dynamically selects the best-performing algorithm based on testing accuracy and generates a full statistical classification report.
  • Visualization: Automatically generates an algorithm_comparison.png bar chart to visually compare the testing accuracy of all algorithms, alongside a confusion_matrix.png plot for the top-performing model to visually inspect its predictive distribution.

Prerequisites

Ensure you have Python installed. The required libraries are listed in requirements.txt.

Install dependencies using:

pip install -r requirements.txt

Running the Model

Execute the main script to train the models, view the comparison output in your terminal, and generate the visual plots:

python model.py

Expected Output

The script prints the dataset features, the sample sizes after splitting, and the training progress. It will then display the accuracy comparison between the 3 models, select the best one, output its classification report, and save the visual plots in the project directory.

About

An automated machine learning pipeline built to classify tabular data. Features a dynamic algorithm comparison engine that simultaneously trains, evaluates, and compares Random Forest, Logistic Regression, and Support Vector Machine (SVM) models to find the optimal predictive architecture.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages