Data Preprocessing and Model Tuning Pipeline Techniques

Overview

This document provides a comprehensive list of techniques used in the Pipeline for Data Preprocessing and Model Tuning. These methods are categorized into key areas such as data cleaning, transformation, splitting, and hyperparameter optimization.

1. Data Cleaning

Missing Data Handling

Deletion: Remove rows or columns with missing values.
Imputation:
- Mean, Median, or Mode for numerical data.
- Forward-fill or Backward-fill for time-series data.

Outlier Detection and Treatment

Statistical Approaches:
- Interquartile Range (IQR) Method.
- Z-Score Method (Standard Deviation Threshold).
Visual Techniques:
- Box Plots.
- Scatter Plots.

Noise Removal

Textual Data:
- Stopword Removal.
- Lemmatization or Stemming.
Numerical Data:
- Smoothing with Moving Averages.

2. Data Transformation

Scaling Techniques

Normalization:
- Min-Max Scaling: Scale values to range [0, 1].
- MaxAbs Scaling: Scale by absolute maximum value.
Standardization:
- Z-Score Scaling: Transform data to have mean = 0 and std = 1.