# Can Decision Tree be used for Regression problems?

--

A regression tree is basically a decision tree that is used to predict continuous valued outputs. In decision trees for classification, the tree splits on the basis of entropy and information gain. However, since we are predicting the continuous variables, entropy cannot not be used instead mean square error is used.

Mean square error(mse) tells us how much our predictions deviate from the original target.

In the above figure, **Y** is the actual value and **Y_hat** is the predicted value.

In the Regression Tree algorithm, we try to reduce the Mean Square Error at each node rather than the entropy. The idea behind the algorithm is to find the point in the independent variable through an iterative process of calculating mean square error for all the splits and choosing the split that has the least value of *mse* and that becomes the top contender for that independent variable*. *The Top* *contenders of all the other independent variables are compared with it and the variable that minimizes the mse would be chosen.

## Python Code for Creating Regression Tree

`# Import DecisionTreeRegressor from sklearn.tree`

from sklearn.tree import DecisionTreeRegressor

dt = DecisionTreeRegressor(max_depth=8,

min_samples_leaf=0.13,

random_state=3)

# Fit dt to the training set

dt.fit(X_train, y_train)

# Predict test set labels

y_pred = dt.predict(X_test)

# Compute mse

mse = MSE(y_test, y_pred)

# Compute rmse

rmse = mse**(1/2)

# Print rmse

print('Regression Tree test set RMSE: {:.2f}'.format(rmse))

## Conclusion

Regression Tree algorithm is useful in a lot of areas where the relationship between the variables are found to be non-linear. However, they are prone to overfitting i.e. models fit to the existing data too perfectly ,but it fails to generalize with new data. One way to prevent this, with respect to Regression trees, is to specify the minimum number of records or rows.