DSSMFM: Combining user and item feature interactions for recommendation systems

Effort has been done to optimize machine learning algorithms by applying relevant knowledges in data fields in recommendation systems. Ways are explored to discover the relationship of features independently, making the model more effective and robust. A new model, DSSMFM is proposed in this paper which combines user and item features interactions to improve the performance of recommendation systems. In this model, data are divided into user features and item features represented by one-hot vectors. The pre-training for the model is proceeded through FM, and implicit vectors are obtained for both user and item features. The implicit vectors are used as the input of DSSM, and the training of the DSSM part of the model will maximize the cosine distances of the user attributes vectors and the item attributes vectors. According to the experimental results on dataset of ICME 2019 Short Video Understanding and Recommendation Challenge, the model shows improvements on some results of the baselines.


Introduction
Undoubtedly, feature play an essential role in many recommendation systems, and a good feature brings about significant benefits to the model and algorithm itself. However, it is rare to acquire the best results by using original features directly. The feature engineering aims to get better training data through relevant knowledge in the data field, but there are obvious shortcomings in traditional feature engineering inevitably. One, in the huge real scenario of the industry, features are in quantity and complicated, which makes manual feature extraction infeasible. Two, manual feature extraction substitutes into prior factors, which, to some extent, effaces the relationship between potential features. Three, each new data set needs to be rebuilt. Different from the traditional machine learning that is a process composed of multiple independent modules which contain complicated feature engineering, some deep learning models are about "end-to-end". According to the end-to-end network, it makes the model can learn independently without human intervention. So, in the field of recommendation systems, it is of great significant to learn complicated and selective feature interaction [1] by using DNN. The deep neural network is good at high-order feature presentation, but it is weak in interpretability and cross-learning of low-order features, which limits the performance of systems. Therefore, it is hoped to find a way that does not need sundry manual feature engineering, and meanwhile, combines vector interactive representation of low-order feature that are applicable to large and scattered data and highorder feature presentation and powerful interpretability. According to the research, DSSM [2] is to map query and doc into the space of common dimension; a model with corresponding representation meaning can be obtained through DNN structure and maximizing cosine similarity between query and doc vectors. FM [3] algorithm refers to a machine learning algorithm based on matrix decomposition, which aims to explore feature portfolio in largescale sparse data. In addition, FM explores cross-correlation of low-order features automatically.
In summary, in this paper, we propose a neural network-based model, DSSMFM-Deep Structured Semantic Model & Factorization Machine, and learned feature interaction through a distinct vectorization way. The potential relationships between user and item were expressed via cosine distance constraints.

Embedding layer
In the recommendation systems, the data are usually sparse, with large characteristic dimension, and the spatial -temporal relationship among each other is weak. Therefore, it is difficult to directly apply dnn in data sets with such features. In other words, it is necessary to complete the switch from vectors characterized by high dimension and sparsity to those by low-dimensional and denseness; the latter ones tend to update constantly, and the relationship among features will be enhanced. For example, the input data is [userid = uid02, authorid = aid01, ..., itemid = iid02], then it will be represented as a high-dimensional sparse vector by One-Hot coding [4]: The output form of the embedded layer is as follows: The above m represents the number of characteristic fields, and ei∈RD is the embeddings of a field.

Features split
Briefly speaking, the DSSM model describes the relationship between query and doc. We used it to find the relationships between user and item, and distinguished features of data, with acquired user features and item features as input to DSSM. It aims to describe the mapping relationships between user vector and item vector. As for user attributes vector and item attributes vector after training, similar feature vectors will gather together into clusters in multi-dimensional space, and the formed user cluster and item cluster become more closely related under the constraint of maximizing cosine distance, such as figure 2a and 2b.

The FM component
The two parts of the model were mainly used. The first part is Factorization Machine that performs the implicit vector of features as inner product to realize the cross of the feature.
, Where k is the vector dimension and N is the number of features, 0 is the global offset, is the weight of feature i, , ≔ < , > is the weight of correlation degree between feature i and feature j.
In FM, the user and item features were used as training data input of FM, acquiring implicit vectors corresponding to user and item features based on the consideration of global features. Each feature was given a bias weight and k-dimension vector, and the feature interaction was modeled as inner product of its vector < , >. In the training of One-Hot coding data input, FM learned good structured data representation in potential space, which is conducive to building model. In addition, the spatial position relation of each feature was granted by implicit vector of each feature obtained by FM pre-training.

The DSSM component
The second part is Deep Structured Semantic Models. In this part, on each side is a DNN, with one on the user side and the other on the item side. DNN on both sides of DSSM loaded the implicit vector of each feature obtained from FM pre-training in the first part. Besides loading the feature vector belonging to user side, DNN on user side also loaded overall feature vector. DNN on item side performed training in the same way as DNN on user side.
x is the input vector, y is the output vector, denotes hidden layer i, is the parameter matrix of layer i, is the offset i. , denotes user side output vector and item side output vector. R(Q, D) is the relevance score.

The combination component
The architecture of the whole model is shown in Figure 3. First, the data were encoded with One-Hot code, input to FM for pre-training, acquiring each feature implicit vector. Then, both sides of DSSM loaded user feature + overall feature and item feature + overall feature respectively to undergo maximized cosine distance training, and to draw the conclusions.  XDEEPFM: xDeepFm [9]-A compression interaction model of learning explicit and implicit high-order feature interaction.
Metric. To evaluate the performance of each CTR model, we used the MSE and the area under ROC curve (AUC). The AUC metric is widely used measure for evaluating the CTR performance.

Performance comparison
Each feature value is treated as word, and the user word list and item word list are formed to prepare for the subsequent DSSM training. It is sent to FM (xlearn) for pre-training, and all feature hidden vectors are obtained. The dimension of feature hidden vector is 64. DSSM loads hidden vectors of features, and inputs user attributes features + overall features and item attributes features + overall features on both sides for training. In this experiment, the learning rate of deep neural network is 1 ^ 10-6, the drop rate is 0.1, the batch size is 10000, and the regularize ratio is 0.001. Experimental data are show in Table 2.

Summary
In this paper, it proposed a neural network-based model DSSMFM. DSSMFM had two special points: One, it reduced the manual feature engineering process and was capable of interpretation. Two, it was able to learn updated constantly high-order features interaction via the representation of low-order features interaction effectively; it learned the mapping relationship between user and item by maximizing the cosine distance between user attributes and item attributes feature vector. The experimental results showed that performance of our model was better than the traditional model. In the future, more work should be made on describing feature representation on both sides, adjusting network structure, strengthening the learning ability of the model in one feature, and trying to achieve better performance through collaborative training.