This repository features a robust, fundamental machine learning pipeline designed to teach machines how to recognize patterns in tabular data and categorize new information based on supervised learning algorithms.
To ensure optimal predictive performance, this project implements a dynamic multi-algorithm comparison architecture, evaluating several classification models side-by-side to find the most accurate solution for the dataset.
- Data Pipeline: Seamlessly loads and processes standard tabular datasets (Iris dataset from
scikit-learn). - Data Segregation: Automatically implements train-test splits (80% training set and 20% testing set) to prevent data leakage and ensure fair evaluation.
- Algorithm Comparison Engine: Trains and evaluates three distinct machine learning algorithms simultaneously:
- Random Forest Classifier
- Logistic Regression
- Support Vector Machine (SVM)
- Automated Evaluation: Dynamically selects the best-performing algorithm based on testing accuracy and generates a full statistical classification report.
- Visualization: Automatically generates an
algorithm_comparison.pngbar chart to visually compare the testing accuracy of all algorithms, alongside aconfusion_matrix.pngplot for the top-performing model to visually inspect its predictive distribution.
Ensure you have Python installed. The required libraries are listed in requirements.txt.
Install dependencies using:
pip install -r requirements.txtExecute the main script to train the models, view the comparison output in your terminal, and generate the visual plots:
python model.pyThe script prints the dataset features, the sample sizes after splitting, and the training progress. It will then display the accuracy comparison between the 3 models, select the best one, output its classification report, and save the visual plots in the project directory.