# 一、要不要one-hot？

I do not know what you mean by vector. xgboost treat every input feature as numerical, with support for missing values and sparsity. The decision is at the user

So if you want ordered variables, you can transform the variables into numerical levels(say age). Or if you prefer treat it as categorical variable, do one hot encoding.

One-hot encoding could be helpful when the number of categories are small( in level of 10 to 100). In such case one-hot encoding can discover interesting interactions like (gender=male) AND (job = teacher).

While ordering them makes it harder to be discovered(need two split on job). However, indeed there is not a unified way handling categorical features in trees, and usually what tree was really good at was ordered continuous features anyway..

• 1.对于类别有序的类别型变量，比如age等，当成数值型变量处理可以的。对于非类别有序的类别型变量，推荐one-hot。但是one-hot会增加内存开销以及训练时间开销。
• 2.类别型变量在范围较小时（tqchen给出的是[10,100]范围内）推荐使用

# 二、one-hot的一致性问题

