# 一、要不要one-hot？

I do not know what you mean by vector. xgboost treat every input feature as numerical, with support for missing values and sparsity. The decision is at the user

So if you want ordered variables, you can transform the variables into numerical levels(say age). Or if you prefer treat it as categorical variable, do one hot encoding.

One-hot encoding could be helpful when the number of categories are small( in level of 10 to 100). In such case one-hot encoding can discover interesting interactions like (gender=male) AND (job = teacher).

While ordering them makes it harder to be discovered(need two split on job). However, indeed there is not a unified way handling categorical features in trees, and usually what tree was really good at was ordered continuous features anyway..

• 1.对于类别有序的类别型变量，比如age等，当成数值型变量处理可以的。对于非类别有序的类别型变量，推荐one-hot。但是one-hot会增加内存开销以及训练时间开销。
• 2.类别型变量在范围较小时（tqchen给出的是[10,100]范围内）推荐使用

# 二、one-hot的一致性问题

## 参考

Updated on

youtube的基于深度学习的推荐系统，主要分成两大部分：# 一、候选生成将推荐当成是一个多分类问题，预测问题为：视频库V，有上百万的视频，某用户U，在上下文C上，在时间t时的观看行为\$w_t\$，刚好是某个视频i.P(w_t =i|U,C)=\frac{e^{v_{...… Continue reading