• AIPressRoom
  • Posts
  • CatBoost Regression: Break It Down For Me | by Shreya Rao | Sep, 2023

CatBoost Regression: Break It Down For Me | by Shreya Rao | Sep, 2023

A complete (and illustrated) breakdown of the interior workings of CatBoost

CatBoost, brief for Categorical Boosting, is a robust machine studying algorithm that excels in dealing with categorical options and producing correct predictions. Historically, coping with categorical knowledge is fairly difficult— requiring one-hot encoding, label encoding, or another preprocessing method that may distort the info’s inherent construction. To sort out this concern, CatBoost employs its personal built-in encoding system known as Ordered Goal Encoding.

Let’s see how CatBoost works in observe by constructing a mannequin to foretell how somebody would possibly price the guide Homicide, She Texted based mostly on their common guide score on Goodreads and their favourite style.

We requested 6 individuals to price Homicide, She Texted and picked up the opposite related details about them.

That is our present coaching dataset, which we’ll use to coach (duh) the info.

Step 1: Shuffle the dataset and Encode the Categorical Knowledge Utilizing Ordered Goal Encoding 

The way in which we preprocess categorical knowledge is central to the CatBoost algorithm. On this case, we solely have one categorical column — Favourite Style. This column is encoded (aka transformed to a discrete integer) and the way in which it’s performed varies relying on whether or not it’s a Regression or Classification downside. Since we’re coping with a Regression downside (as a result of the variable we wish to predict Homicide, She Texted Score is steady) we observe the next steps.

1 — Shuffle the dataset:

2 — Put the continual goal variable into discrete buckets: Since we’ve got little or no knowledge right here, we’ll create 2 buckets of the identical measurement to categorize the goal. (Study extra about create buckets here).

We put the three smallest values of Homicide, She Texted Score in bucket 0 and the remaining in bucket 1.