Data Preprocessing and Model Tuning Pipeline Techniques
Overview
This document provides a comprehensive list of techniques used in the Pipeline for Data Preprocessing and Model Tuning. These methods are categorized into key areas such as data cleaning, transformation, splitting, and hyperparameter optimization.
1. Data Cleaning
Missing Data Handling
- Deletion: Remove rows or columns with missing values.
- Imputation:
- Mean, Median, or Mode for numerical data.
- Forward-fill or Backward-fill for time-series data.
Outlier Detection and Treatment
- Statistical Approaches:
- Interquartile Range (IQR) Method.
- Z-Score Method (Standard Deviation Threshold).
- Visual Techniques:
- Box Plots.
- Scatter Plots.
Noise Removal
- Textual Data:
- Stopword Removal.
- Lemmatization or Stemming.
- Numerical Data:
- Smoothing with Moving Averages.
2. Data Transformation
Scaling Techniques
- Normalization:
- Min-Max Scaling: Scale values to range [0, 1].
- MaxAbs Scaling: Scale by absolute maximum value.
- Standardization:
- Z-Score Scaling: Transform data to have mean = 0 and std = 1.