CLD6000: Data Preprocessing and Model Tuning Pipeline


Problem Definition and Scope

The CLD6000 project focuses on Natural Language Processing (NLP) and Feature Engineering as part of the Contemporary Problem Analysis. The main deliverables include:

Core Goals

  1. Develop and validate a data preprocessing pipeline.
  2. Ensure proper separation of concerns (e.g., preprocessing vs classification tasks).
  3. Maintain a single source of truth in the feature engineering workflow.

Pipeline Breakdown

1. Data Preprocessing

Data Cleaning

Data Normalization

Data Transformation