Data Classification Pipeline Using AI

Project Overview

This repository features a robust, fundamental machine learning pipeline designed to teach machines how to recognize patterns in tabular data and categorize new information based on supervised learning algorithms.

To ensure optimal predictive performance, this project implements a dynamic multi-algorithm comparison architecture, evaluating several classification models side-by-side to find the most accurate solution for the dataset.

Core Features

Data Pipeline: Seamlessly loads and processes standard tabular datasets (Iris dataset from scikit-learn).
Data Segregation: Automatically implements train-test splits (80% training set and 20% testing set) to prevent data leakage and ensure fair evaluation.
Algorithm Comparison Engine: Trains and evaluates three distinct machine learning algorithms simultaneously:
- Random Forest Classifier
- Logistic Regression
- Support Vector Machine (SVM)
Automated Evaluation: Dynamically selects the best-performing algorithm based on testing accuracy and generates a full statistical classification report.
Visualization: Automatically generates an algorithm_comparison.png bar chart to visually compare the testing accuracy of all algorithms, alongside a confusion_matrix.png plot for the top-performing model to visually inspect its predictive distribution.

Prerequisites

Ensure you have Python installed. The required libraries are listed in requirements.txt.

Install dependencies using:

pip install -r requirements.txt

Running the Model

Execute the main script to train the models, view the comparison output in your terminal, and generate the visual plots:

python model.py

Expected Output

The script prints the dataset features, the sample sizes after splitting, and the training progress. It will then display the accuracy comparison between the 3 models, select the best one, output its classification report, and save the visual plots in the project directory.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
algorithm_comparison.png		algorithm_comparison.png
confusion_matrix.png		confusion_matrix.png
model.py		model.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Classification Pipeline Using AI

Project Overview

Core Features

Prerequisites

Running the Model

Expected Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Classification Pipeline Using AI

Project Overview

Core Features

Prerequisites

Running the Model

Expected Output

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages