📉 Chapter 12 — Data Reduction¶
This chapter explores the need for Data Reduction, its strategies, and key techniques like Wavelet Transformation, Principal Component Analysis (PCA), and Attribute Subset Selection.
🔁 1. Overview of Data Reduction¶
Data Reduction aims to obtain a reduced representation of a dataset that is smaller in volume but produces almost the same analytical results.
graph TD
A[Raw Large Dataset] --> B[Reduction Techniques]
B --> C[Reduced Dataset]
C --> D[Efficient Analysis] Why Data Reduction?¶
- To reduce storage and computation costs
- To improve algorithm efficiency
- To eliminate redundant or irrelevant features
- To enhance interpretability without losing essential information
🧩 2. Strategies for Data Reduction¶
| Strategy | Description | Example Techniques |
|---|---|---|
| Data Cube Aggregation | Summarize data at multiple granularities | Roll-up operations in OLAP |
| Dimensionality Reduction | Reduce number of attributes | PCA, Feature selection |
| Numerosity Reduction | Replace data with smaller models | Clustering, Regression models |
| Data Compression | Encode data efficiently | Wavelet transform, Huffman coding |
graph TD
A[Data Reduction] --> B[Dimensionality Reduction]
A --> C[Numerosity Reduction]
A --> D[Data Compression] 🌊 3. Wavelet Transformation¶
Definition¶
The Wavelet Transform decomposes data into components of different frequency sub-bands — offering both time and frequency information.
Unlike the Fourier Transform, which uses only sine and cosine waves, Wavelets use localized basis functions.
3.1 Mathematical Concept¶
A signal \( f(t) \) can be represented as: \(f(t) = \sum_{j,k} c_{j,k} \, \psi_{j,k}(t)\)
where
- \( \psi_{j,k}(t) \) are wavelet basis functions at scale j and translation k
- \( c_{j,k} \) are wavelet coefficients.
3.2 Example — Haar Wavelet Transformation¶
Original Data: [4, 6, 10, 12]
- Pairwise averages and differences:
- Level 1 Averages: [(4+6)/2, (10+12)/2] = [5, 11]
-
Level 1 Differences: [(4−6)/2, (10−12)/2] = [−1, −1]
-
Repeat on averages:
- Level 2 Average: (5+11)/2 = 8
- Level 2 Difference: (5−11)/2 = −3
Wavelet Coefficients: [8, −3, −1, −1]
graph TD
A[Raw Data: 4,6,10,12] --> B[Compute Averages & Differences]
B --> C[Level 1: #40; 5,11 #41;, #40; -1,-1 #41;]
C --> D[Level 2: #40; 8,-3 #41;]
D --> E[Coefficients: 8, -3, -1, -1] Wavelet coefficients provide a compressed representation while retaining key trends.
🧮 4. Principal Component Analysis (PCA)¶
4.1 Purpose¶
PCA reduces dimensionality by transforming correlated features into a new set of uncorrelated variables (principal components) that capture maximum variance.
graph TD
A[Correlated Features X1, X2, X3] --> B[Covariance Matrix]
B --> C[Eigen Decomposition]
C --> D[Principal Components Z1, Z2, ...] 4.2 Mathematical Steps¶
Given data matrix \( X \) (n samples × p features):
- Standardize data:
\(x'_{ij} = \frac{x_{ij} - \mu_j}{\sigma_j}\) - Compute covariance matrix:
\(C = \frac{1}{n-1} X^T X\) - Find eigenvalues and eigenvectors:
\(C v_i = \lambda_i v_i\) - Sort eigenvalues in descending order → select top k components.
- Project data:
\(Z = X V_k\)
where \(V_k\) contains eigenvectors of top k eigenvalues.
4.3 Example (by hand)¶
Data (2D):
| X1 | X2 |
|---|---|
| 2.5 | 2.4 |
| 0.5 | 0.7 |
| 2.2 | 2.9 |
| 1.9 | 2.2 |
| 3.1 | 3.0 |
| 2.3 | 2.7 |
| 2.0 | 1.6 |
| 1.0 | 1.1 |
| 1.5 | 1.6 |
| 1.1 | 0.9 |
Covariance matrix:
$$ C = \begin{bmatrix} 0.6166 & 0.6154 \ 0.6154 & 0.7166 \end{bmatrix} $$
Eigenvalues: 1.284, 0.049
→ First principal component captures 96% variance.
→ One dimension (Z1) is enough to represent data.
🧠 5. Attribute Subset Selection¶
Definition¶
Select the most relevant attributes to preserve predictive power while reducing redundancy.
5.1 Methods¶
| Method | Description | Example |
|---|---|---|
| Filter | Use statistical tests independent of model | Correlation, Chi‑square |
| Wrapper | Evaluate subsets using model performance | Forward/backward selection |
| Embedded | Feature selection occurs during training | LASSO, Decision tree importance |
graph TD
A[All Attributes] --> B[Filter Method]
A --> C[Wrapper Method]
A --> D[Embedded Method]
B & C & D --> E[Selected Subset] 5.2 Example — Backward Elimination¶
- Start with all features: F1, F2, F3, F4
- Remove one with least contribution to model accuracy
- Repeat until no improvement
🧩 6. Comparison of Reduction Techniques¶
| Technique | Purpose | Output | Example Tool |
|---|---|---|---|
| Wavelet Transform | Compress numeric data | Coefficients | MATLAB, PyWavelets |
| PCA | Reduce correlated features | Principal components | scikit‑learn |
| Attribute Subset Selection | Choose best predictors | Selected columns | Weka, Python sklearn |
📘 7. Practice Questions¶
- Explain how wavelet transformation achieves data compression.
- Derive the PCA transformation mathematically.
- Distinguish between numerosity reduction and dimensionality reduction.
- Compare filter, wrapper, and embedded feature selection methods.
- Compute the first principal component for a 2D dataset manually.