Feature Engineering in Machine Learning

Feature engineering means transforming raw data into a feature vector. Expect to spend significant time doing feature engineering

Mapping numeric values to Machine Learning Features

Mapping numeric values to features is easy. Integer and floating-point data don't need a special encoding because they can be multiplied by a numeric weight. As suggested in Figure 2, converting the raw integer value 6 to the feature value 6.0 is trivial:

Machine learning Numeric Features

Mapping categorical values to Machine learning Features

Categorical features have a discrete set of possible values. For example, there might be a feature called street_name with options that include:

{'Charleston Road', 'North Shoreline Boulevard', 'Shorebird Way', 'Rengstorff Avenue'}

Since models cannot multiply strings by the learned weights, we use feature engineering to convert strings to numeric values.

One quick approach is to map our street names to numbers:

  • map Charleston Road to 0
  • map North Shoreline Boulevard to 1
  • map Shorebird Way to 2
  • map Rengstorff Avenue to 3
  • map everything else (OOV) to 4

However, this approach has two disadvantages:

  1. Our model assigns θ (weights) to features. We'll be learning a single weight that applies to all streets. For example, if we learn a weight of 6 for street_name, then we will multiply it by 0 for Charleston Road, by 1 for North Shoreline Boulevard, 2 for Shorebird Way and so on. It is unlikely that there is a linear adjustment of price based on the street name
  2. We aren't accounting for cases where street_name may take multiple values. For example, many houses are located at the corner of two streets, and there's no way to encode that information in the street_name value if it contains a single index

So, to overcome these limitations, we can instead create a binary vector for each categorical feature in our model that represents values as follows:

  • For values that apply to the example, set corresponding vector elements to 1.
  • Set all other elements to 0.

The length of this vector is equal to the number of elements in the vocabulary. This representation is called a one-hot encoding when a single value is 1, and a multi-hot encoding when multiple values are 1

Machine learning String Features

This approach effectively creates a Boolean variable for every feature value (e.g., street name). Here, if a house is on Shorebird Way then the binary value is 1 only for Shorebird Way. Thus, the model uses only the weight for Shorebird Way

One-hot encoding extends to numeric data that you do not want to directly multiply by a weight, such as a postal code

Similarly, if a house is at the corner of two streets, then two binary values are set to 1, and the model uses both their respective weights

Sparse Representation of Features

Suppose that you had 1,000,000 different street names in your data set that you wanted to include as values for street_name. Explicitly creating a binary vector of 1,000,000 elements where only 1 or 2 elements are true is a very inefficient representation in terms of both storage and computation time when processing these vectors. In this situation, a common approach is to use a sparse representation in which only nonzero values are stored. In sparse representations, an independent model weight is still learned for each feature value, as described above