- Recommender System
- Collaborative Filtering
- Matrix Completion
- Maximum Margin Matrix Factorization
- SVD and Beyond
- Probabilistic Matrix Factorization
- Poisson Factorization
- Collaborative Less-is-More Filtering(CliMF)
- Matrix Factorization for Implicit Feedback
- Discrete Collaborative Filtering
- Recommendation with Implicit Information
- Inductive Matrix Completion
- Beyond Matrix Completion
- Factorization Machines(FM)
- Deep Learning for Recommender System
- Restricted Boltzmann Machines for Collaborative Filtering
- AutoRec for Collaborative Filtering
- Deep crossing
- Neural collaborative filtering
- Collaborative deep learning for RecSys
- Wide & Deep Model
- Deep FM
- Neural Factorization Machines
- Attentional Factorization Machines
- xDeepFM
- RepeatNet
- Deep Matrix Factorization
- Deep Matching Models for Recommendation
- Embedding methods for RecSys
- Graph-based RecSys
- Feature Interaction Selection in RecSys
- Ensemble Methods for Recommender System
- Tree-based Deep Model for Recommender Systems
- Context-aware Recommendations
- Sequential Recommender Systems
- Top-N recommendation
- Explainable Recommendations
- Social Recommendation
- Knowledge Graph and Recommender System
- Reinforcement Learning and Recommender System
- Adversarial Learning for Recommender Systems
- Health Recommender Systems
- Resource on RecSys
- Preference Learning
- Computational Advertising
Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user. RSs are primarily directed towards individuals who lack sufficient personal experience or competence to evaluate the potentially overwhelming number of alternative items that a Web site, for example, may offer.
Xavier Amatriain discusses the traditional definition and its data mining core.
Traditional definition: The recommender system is to estimate a utility function that automatically predicts how a user will like an item.
User Interest is implicitly reflected in Interaction history
, Demographics
and Contexts
which can be regarded as a typical example of data mining.
Recommender system should match a context to a collection of information objects.
There are some methods called Deep Matching Models for Recommendation
It is an application of machine learning, which is in the representation + evaluation + optimization form.
And we will focus on the representation and evaluation
Evolution of the Recommender Problem:
- Rating
- Ranking
- Page Optimization
- Context-aware Recommendations
------------------ |
Collaborative Filtering (CF) |
Content-Based Filtering (CBF) |
Demographic Filtering (DF) |
Knowledge-Based Filtering (KBF) |
Hybrid Recommendation Systems |
Evaluation of Recommendation System
The evaluation of machine learning algorithms depends on the tasks.
The evaluation of recommendation system can be regarded as some machine learning models such as regression, classification and so on.
We only take the mathematical convenience into consideration in the following methods.
Gini index, covering rate
and more realistic factors are not discussed in the following content.
There are 3 kinds of collaborative filtering: user-based, item-based and model-based collaborative filtering.
The user-based methods are based on the similarities of users. If user
The item-based methods are based on the similarity of items. If one person added a brush to shopping-list, it is reasonable to recommend some toothpaste to him or her. And we can explain that you bought item
Matrix completion is to complete the matrix
Note that the rank of a matrix is not easy or robust to compute.
We can apply customized PPA to matrix completion problem
$$ \min { {|Z|}{\ast}} \ s.t. Z{\Omega} = X_{\Omega} $$
We let
- Producing
$Y^{k+1}$ by$$Y^{k+1}=\arg\max_{Y} {L([2Z^k-Z^{k-1}],Y)-\frac{s}{2}|Y-Y^k|};$$ - Producing
$Z^{k+1}$ by$$Z^{k+1}=\arg\min_{Z} {L(Z,Y^{k+1}) + \frac{r}{2}|Z-Z^k|}.$$
Rahul Mazumder, Trevor Hastie, Robert Tibshirani reformulate it as the following:
$$ \min f_{\lambda}(Z)=\frac{1}{2}{|P_{\Omega}(Z-X)|}F^2 + \lambda {|Z|}{\ast} $$
A novel approach to collaborative prediction is presented, using low-norm instead of low-rank factorizations. The approach is inspired by, and has strong connections to, large-margin linear discrimination. We show how to learn low-norm factorizations by solving a semi-definite program, and present generalization error bounds based on analyzing the Rademacher complexity of low-norm factorizations.
Consider the soft-margin learning, where we minimize a trade-off between the trace norm of
And it can be rewritten as a semi-definite optimization problem (SDP):
\min_{A, B} \frac{1}{2}(tr(A)+tr(B))+c\sum_{(ui)\in O}\xi_{ui}, \
s.t. , \begin{bmatrix} A & X \ X^T & B \ \end{bmatrix} \geq 0, Z_{ui}X_{ui}\geq 1- \xi_{ui},
\xi_{ui}>0 ,\forall ui\in O
This technique is also called nonnegative matrix factorization.
ordinal ratings
If we have collected user
And we can predict the score
$$ C(P,Q) = \sum_{(u,i):Observed}(r_{u,i}-\hat{r}{u,i})^{2}=\sum{(u,i):Observed}(r_{u,i}-\sum_f p_{u,f}q_{i,f})^{2}\ \arg\min_{P_u, Q_i} C(P, Q) $$
Additionally, we can add regular term into the cost function to void over-fitting
It is called the regularized singular value decomposition or Regularized SVD.
Funk-SVD considers the user's preferences or bias. It predicts the scores by $$ \hat{r}{u,i} = \mu + b_u + b_i + \left< P_u, Q_i \right> $$ where $\mu, b_u, b_i$ is biased mean, biased user, biased item, respectively. And the cost function is defined as $$ \min\sum{(u,i): Observed}(r_{u,i} - \hat{r}_{u,i})^2 + \lambda (|P_u|^2+|Q_i|^2+|b_i|^2+|b_u|^2). $$
SVD ++ predicts the scores by
\hat{r}{u,i} = \mu + b_u + b_i + (P_u + |N(u)|^{-0.5}\sum{i\in N(u)} y_i) Q_i^{T}
$\mu + b_u + b_i$ is the base-line prediction; -
$\left<P_u, Q_i\right>$ is the SVD of rating matrix; -
$\left<|N(u)|^{-0.5}\sum_{i\in N(u)} y_i, Q_i\right>$ is the implicit feedback where$N(u)$ is user${u}$ 's item set,$y_j$ is the implicit feedback of item$j$ .
We learn the values of involved parameters by minimizing the regularized squared error function.
In linear regression, the least square methods is equivalent to maximum likelihood estimation of the error in standard normal distribution.
Regularized SVD |
Probabilistic model |
So that we can reformulate the optimization problem as maximum likelihood estimation.
- Latent Factor Models for Web Recommender Systems
Sometimes, the information of user we could collect is implicit such as the clicking at some item.
the model parameters are learned by directly maximizing the Mean Reciprocal Rank (MRR).
Its objective function is $$ F(U,V)=\sum_{i=1}^{M}\sum_{j=1}^{N} Y_{ij} [\ln g(U_{i}^{T}V_{j})+\sum_{k=1}^{N}\ln (1 - Y_{ij} g(U_{i}^{T}V_{k}-U_{i}^{T}V_{j}))] \-\frac{\lambda}{2}({|U|}^2 + {|V|}^2) $$
Numbers | Factors | Others | |||
the number of users | latent factor vector for user |
binary relevance score | |||
the number of items | latent factor vector for item |
logistic function |
We use stochastic gradient ascent to maximize the objective function.
Another advantage of collaborative filtering or matrix completion is that even the element of matrix is binary or implicit information such as
WRMF is simply a modification of this loss function:
$$ {C(P,Q)}{WRMF} = \sum{(u,i):Observed}c_{u,i}(I_{u,i} - \sum_f p_{u,f}q_{i,f})^{2} + \underbrace{\lambda_u|P_u|^2 + \lambda_i|Q_i|^2}_{\text{regularization terms}}. $$
We make the assumption that if a user has interacted at all with an item, then
WRMF does not make the assumption that a user who has not interacted with an item does not like the item. WRMF does assume that that user has a negative preference towards that item, but we can choose how confident we are in that assumption through the confidence hyperparameter.
Alternating least square (ALS) can give an analytic solution to this optimization problem by setting the gradients equal to 0s.
- Faster Implicit Matrix Factorization
One possible improvement of this cost function is that we may design more appropriate loss function other than the squared error function.
Inductive Matrix Completion (IMC) is an algorithm for recommender systems with side-information of users and items. The IMC formulation incorporates features associated with rows (users) and columns (items) in matrix completion, so that it enables predictions for users or items that were not seen during training, and for which only features are known but no dyadic information (such as ratings or linkages).
IMC assumes that the associations matrix is generated by applying feature vectors associated with
its rows as well as columns to a low-rank matrix
The inputs
\min \sum_{(i,j)\in \Omega}\ell(P_{(i,j)}, x_i^T W H^T y_j) + \frac{\lambda}{2}(| W |^2+| H |^2)
The loss function
- Inductive Matrix Completion for Recommender Systems with Side-Information
- Inductive Matrix Completion for Predicting Gene-Diseasev Associations
There are 2 common techniques in recommender systems:
- The goal of
matrix factorization
techniques in RS is to determine a low-rank approximation of the user-item rating matrix by decomposing it into a product of (user and item) matrices of lower dimensionality (latent factors). - The idea of
ensemble methods
is to combine multiple alternative machine learning models to obtain more accurate predictions.
There are 2 disadvantages of Matrix Completion:
$Postdiction \not= prediction$ - Need initial post data
- Predict poorly on a random set of items the user has not rated.
- Repeated recommendation of purchased items
- The evaluation method of Netflix Prize is misleading. RMSE(regression) vs Rank-based measures(sorting)
- Quality factors beyond accuracy
- Introduce why we use the quality factors:
- Novelty, diversity and unexpectedness(How to recommend new things to users exactly)
- Depend on context and different problems
- Interact with users: conversational recommender systems
- Example of context and interaction:To Be Continued: Helping you find shows to continue watching on Netflix(search the “context”)
- Manipulation resistance
- Recommendation is optimal to sellers not users - transparency and explanation strategy (nearly a moral problem).
From Algorithms to Systems
Beyond the computer science perspective.
Putting the user back in the loop.
Toward a more comprehensive characterization of the recommendation task.
Collaborative filtering has become a key tool in recommender systems. The Netflix competition was instrumental in this context to further development of scalable tools. At its heart lies the minimization of the Root Mean Squares Error (RMSE) which helps to decide upon the quality of a recommender system. Moreover, minimizing the RMSE comes with desirable guarantees of statistical consistency. In this talk I make the case that RMSE minimization is a poor choice for a number of reasons: firstly, review scores are anything but Gaussian distributed, often exhibiting asymmetry and bimodality in their scores. Secondly, in a retrieval setting accuracy matters primarily for the top rated items. Finally, such ratings are highly context dependent and should only be considered in interaction with a user. I will show how this can be accomplished easily by relatively minor changes to existing systems.
- https://www.researchgate.net/project/Proactive-Recommendation-Delivery
- Beyond Matrix Completion of the traditional Recommender System
- Recommender systems---: Recommender systems---: beyond matrix completion
- Notes of "Recommender Systems - Beyond Matrix Completion"
- Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions
The matrix completion used in recommender system are linear combination of some features such as regularized SVD and they only take the user-user interaction and item-item similarity.
Factorization Machines(FM)
is inspired from previous factorization models.
It represents each feature an embedding vector, and models the second-order feature interactions:
= w_0 + \sum_{i=1}^{n} w_i x_i+\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}\left<v_i, v_j\right> x_i x_j\
= \underbrace{w_0 + \left<w, x\right>}{\text{First-order: Linear Regression}} + \underbrace{\sum{i=1}^{n-1}\sum_{j=i+1}^{n}\left<v_i, v_j\right> x_i x_j}_{\text{Second-order: pair-wise interactions between features}}
where the model parameters that have to be estimated are $$ w_0 \in \mathbb{R}, w\in\mathbb{R}^n, V\in\mathbb{R}^{n\times k}. $$
And the linear regression the first order part
; the pair-wise interactions between features
second order part
However, why we call it factorization machine
? Where is the factorization?
If ${[W]}{ij}=w{ij}= \left<v_i, v_j\right>$,
In order to reduce the computation complexity, the second order part
The next step is to find the optimal parameters of the model using the numerical optimization methods.
Optimality of model parameters is usually defined with a loss function
Factorization machines are a generic framework which allows to mimic many factorization models simply by feature engineering. In this way, they combine the high predictive accuracy of factorization models with the flexibility of feature engineering. Unfortunately, factorization machines involve a non-convex optimization problem and are thus subject to bad local minima. In this paper, we propose a convex formulation of factorization machines based on the nuclear norm. Our formulation imposes fewer restrictions on the learned model and is thus more general than the original formulation. To solve the corresponding optimization problem, we present an efficient globally-convergent two-block coordinate descent algorithm. Empirically, we demonstrate that our approach achieves comparable or better predictive accuracy than the original factorization machines on 4 recommendation tasks and scales to datasets with 10 million samples.
And the objective function to optimize is the regularized empirical loss function or structured empirical loss function:
$$\sum_{i}\ell(\hat{y}(x_i), y_i)+\frac{\alpha}{2}|w|2^2+\beta|Z|{\ast}$$
Deep learning is powerful in processing visual and text information so that it helps to find the interests of users such as Deep Interest Network, xDeepFM and more.
Deep learning models for recommender system may come from the restricted Boltzman machine. And deep learning models are powerful information extractors. Deep learning is really popular in recommender system such as spotlight.
What is the role deep learning plays in recommender system? At one hand, deep learning helps to match the user and items based on the history of their interactions such as deep matching
and deep collaborative learning
In mathematics, it is a function that evaluates the how likely the user would interact with the items in some context:
- https://www.kdd.org/kdd2016/papers/files/adf0975-shanA.pdf
This model leverages the flexibility and
non-linearity of neural networks to replace dot products of matrix factorization
, aiming at enhancing the model expressiveness. In specific, this model is structured with two subnetworks including generalized matrix factorization (GMF) and MLP and models the interactions from two pathways instead of simple inner products. The outputs of these two networks are concatenated for the final prediction scores calculation. Unlike the rating prediction task in AutoRec, this model generates a ranked recommendation list to each user based on the implicit feedback. We will use the personalized ranking loss introduced in the last section to train this model.
- Neural Collaborative Filtering
- https://d2l.ai/chapter_recommender-systems/neumf.html
- https://github.com/hexiangnan/neural_collaborative_filtering
- Neural Collaborative Filtering vs. Matrix Factorization Revisited
- Outer Product-based Neural Collaborative Filtering
Given part of the ratings in
Stacked denoising autoencoders(SDAE)
is a feedforward neural network for learning
representations (encoding) of the input data by learning to predict the clean input itself in the output.
Using the Bayesian SDAE as a component, the generative
process of CDL is defined as follows:
For each layer
${l}$ of the SDAE network,- For each column
${n}$ of the weight matrix$W_l$ , draw$$W_l;{\ast}n \sim \mathcal{N}(0,\lambda_w^{-1} I_{K_l}).$$ - Draw the bias vector
$$b_l \sim \mathcal{N}(0,\lambda_w^{-1} I_{K_l}).$$ - For each row
${j}$ of$X_l$ , draw$$X_{l;j\ast}\sim \mathcal{N}(\sigma(X_{l-1;j\ast}W_l b_l), \lambda_s^{-1} I_{K_l}).$$
- For each column
For each item
${j}$ ,- Draw a clean input
$$X_{c;j\ast}\sim \mathcal{N}(X_{L, j\ast}, \lambda_n^{-1} I_{K_l}).$$ - Draw a latent item offset vector
$\epsilon_j \sim \mathcal{N}(0, \lambda_v^{-1} I_{K_l})$ and then set the latent item vector to be:$$v_j=\epsilon_j+X^T_{\frac{L}{2}, j\ast}.$$
- Draw a clean input
Draw a latent user vector for each user
${i}$ :$$u_i \sim \mathcal{N}(0, \lambda_u^{-1} I_{K_l}).$$ -
Draw a rating
$R_{ij}$ for each user-item pair$(i; j)$ :$$R_{ij}\sim \mathcal{N}(u_i^T v_j, C_{ij}^{-1}).$$
Here $\lambda_w, \lambda_s, \lambda_n, \lambda_u$and
And joint log-likelihood of these parameters is $$L=-\frac{\lambda_u}{2}\sum_{i} {|u_i|}2^2-\frac{\lambda_w}{2}\sum{l} [{|W_l|}F+{|b_l|}2^2]\ -\frac{\lambda_v}{2}\sum{j} {|v_j - X^T{\frac{L}{2},j\ast}|}2^2-\frac{\lambda_n}{2}\sum{l} {|X_{c;j\ast}-X_{L;j\ast}|}2^2 \ -\frac{\lambda_s}{2}\sum{l}\sum_{j} {|\sigma(X_{l-1;j\ast}W_l b_l)-X_{l;j}|}2^2 -\sum{ij} {|R_{ij}-u_i^Tv_j|}_2^2 $$
It is not easy to prove that it converges.
The output of this model is
P(Y=1|x) = \sigma(W_{wide}^T[x,\phi(x)] + W_{deep}^T \alpha^{(lf)}+b)
where the wide
part deal with the categorical features such as user demographics and the deep
part deal with continuous features.
\hat{y} = w_0 + \left<w, x\right> + f(x)
where the first and second terms are the linear regression part similar to that for FM, which models global bias of data and weight
of features. The third term multi-layered feedforward neural network
B-Interaction Layer
including Bi-Interaction Pooling
is an innovation in artificial neural network.
- Deep Matrix Factorization Models for Recommender Systems
- Deep Matrix Factorization for Recommender Systems with Missing Data not at Random
It is essential for the recommender system to find the item which matches the users' demand. Its difference from web search is that recommender system provides item information even if the users' demands or generally interests are not provided. It sounds like modern crystal ball to read your mind.
In A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems the authors propose to extract rich features from user’s browsing and search histories to model user’s interests. The underlying assumption is that, users’ historical online activities reflect a lot about user’s background and preference, and therefore provide a precise insight of what items and topics users might be interested in.
Its training data set and the test data is ${(\mathrm{X}i, y_i, r_i)\mid i =1, 2, \cdots, n}$ and $(\mathrm{X}{n+1}, y_{n+1})$, respectively.
Matching Model is trained using the training data set: a class of `matching functions’
The data is assumed to be generated according to the distributions
In fact, the inputs x and y can be instances (IDs), feature vectors, and structured objects, and thus the task can be carried out at instance level, feature level, and structure level.
Framework of Matching |
Output: MLP |
Aggregation: Pooling, Concatenation |
Interaction: Matrix, Tensor |
Representation: MLP, CNN, LSTM |
Input: ID Vectors |
Sometimes, matching model and ranking model are combined and trained together with pairwise loss. Deep Matching models takes the ID vectors and features together as the input to a deep neural network to train the matching scores including Deep Matrix Factorization, AutoRec, Collaborative Denoising Auto-Encoder, Deep User and Image Feature, Attentive Collaborative Filtering, Collaborative Knowledge Base Embedding.
semantic-based matching models
Many well-established recommender systems are based on representation learning in Euclidean space.
In these models, matching functions such as the Euclidean distance or inner product are typically used for computing similarity scores between user and item embeddings.
Hyperbolic Recommender Systems
investigate the notion of learning user and item representations in hyperbolic space.
Given a user
Hyperbolic Bayesian Personalized Ranking(HyperBPR)
leverages BPR pairwise learning to minimize the pairwise ranking loss between the positive and negative items.
Given a user
The parameters of our model are learned by using RSGD
Generally, feature interactions matter in recommender system.
- https://www.cse.msu.edu/~zhaoxi35/
- https://sites.google.com/view/kdd20-marketplace-autorecsys/
The RecSys can be considered as some regression or classification tasks, so that we can apply the ensemble methods to these methods as BellKor's Progamatic Chaos
used the blended solution to win the prize.
In fact, its essence is bagging or blending, which is one sequential ensemble strategy in order to avoid over-fitting or reduce the variance.
In this section, the boosting is the focus, which is to reduce the error and boost the performance from a weaker learner.
There are two common methods to construct a stronger learner from a weaker learner: (1) reweight the samples and learn from the error: AdaBoosting; (2) retrain another learner and learn to approximate the error: Gradient Boosting.
Until now, we consider the recommendation task as a regression prediction process, which is really common in machine learning. The boosting or stacking methods may help us to enhance these methods.
A key to achieving highly competitive results on the Netflix data is usage of sophisticated blending schemes, which combine the multiple individual predictors into a single final solution. This significant component was managed by our colleagues at the Big Chaos team. Still, we were producing a few blended solutions, which were later incorporated as individual predictors in the final blend. Our blending techniques were applied to three distinct sets of predictors. First is a set of 454 predictors, which represent all predictors of the BellKor’s Pragmatic Chaos team for which we have matching Probe and Qualifying results. Second, is a set of 75 predictors, which the BigChaos team picked out of the 454 predictors by forward selection. Finally, a set of 24 BellKor predictors for which we had matching Probe and Qualifying results. from Netflix Prize.
By indexing items in a tree hierarchy and training a user-node preference prediction model satisfying a max-heap like property in the tree,
provides logarithmic computational complexity w.r.t. the corpus size, enabling the use of arbitrary advanced models in candidate retrieval and recommendation.
Our purpose, in this paper, is to develop a method to jointly learn the index structure and user preference prediction model
Recommendation problem is basically to retrieve a set of most relevant or preferred items for each user request from the entire corpus
In the practice of large-scale recommendation, the algorithm design should strike a balance between accuracy and efficiency.
The above methods include 2 stages/models: (1) find the preference of the users based on history or other information; (2) retrive some items according to the predicted preferences.
TDM uses a tree hierarchy to organize items, and each leaf node in the tree corresponds to an item. Like a max-heap,
TDM assumes that each user-node preference is the largest one among the node’s all children’s preferences
The main idea is to predict user interests from coarse to fine by traversing tree nodes in a top-down fashion and making decisions for each user-node pair
Each item in the corpus is firstly assigned to a leaf node of a tree
hierarchy top-down beam search strategy
is carried out level by level.
TDM uses a tree as index and creatively proposes a max-heap like probability formulation on the tree, where the user preference for each non-leaf node
where TDM turns the recommendation task into a hierarchical retrieval problem
By a top-down retrieval process, the candidate items are selected gradually from coarse to detailed.
According to the retrieval process, the recommendation accuracy of TDM is determined by the quality of the user preference model
Denote max-heap property
the user preference probability of all
where we sum up the negative logarithm of predicted user-node preference probability on all the positive training samples and their ancestor user-node pairs as the global empirical loss.
- https://github.com/DeepGraphLearning/RecommenderSystems
- https://github.com/DeepGraphLearning
- https://jian-tang.com/
- Learning Tree-based Deep Model for Recommender Systems
- Joint Optimization of Tree-based Index and Deep Model for Recommender Systems
- Improving Top-N Recommendation with Heterogeneous Loss
- https://blog.csdn.net/lthirdonel/article/details/80021282
- Top-N Recommendations from Implicit Feedback Leveraging Linked Open Data
- Top-N Recommendations from Implicit Feedback Leveraging Linked Open Data ?
Explainable recommendation and search attempt to develop models or methods that not only generate high-quality recommendation or search results, but also intuitive explanations of the results for users or system designers, which can help improve the system transparency, persuasiveness, trustworthiness, and effectiveness, etc.
Providing personalized explanations for recommendations can help users to understand the underlying insight of the recommendation results, which is helpful to the effectiveness, transparency, persuasiveness and trustworthiness of recommender systems. Current explainable recommendation models mostly generate textual explanations based on pre-defined sentence templates. However, the expressiveness power of template-based explanation sentences are limited to the pre-defined expressions, and manually defining the expressions require significant human efforts
Social Recommender Systems (SRSs) aim to alleviate information overload over social media users by presenting the most attractive and relevant content, often using personalization techniques adapted for the specific user. SRSs also aim at increasing adoption, engagement, and participation of new and existing users of social media sites. In addition to recommending content to consume, new types of recommendations emerge within social media, such as of people and communities to connect to, to follow, or to join.
User-item/user-user interactions are usually in the form of graph/network structure. What is more, the graph is dynamic, and we need to apply to new nodes without model retraining.
Based on the assumption of trust aware recommender
- users have similar tastes with other users they trust
- the transitivity of trust and propagate trust to indirect neighbors in the social network.
Traditional Approaches | Beyond Traditional Methods |
Collaborative Filtering | Tensor Factorization & Factorization Machines |
Content-Based Recommendation | Social Recommendations |
Item-based Recommendation | Learning to rank |
Hybrid Approaches | MAB Explore/Exploit |
- https://www.cs.ubc.ca/~rng/
- http://homepages.inf.ed.ac.uk/ckiw/
- http://groups.csail.mit.edu/medg/people/psz/home/Pete_MEDG_site/Home.html
- MIT CSAIL Clinical Decision Making Group
- Which Doctor to Trust: A Recommender System for Identifying the Right Doctors
- Recommendation of Doctors and Medicines Using Review Mining
- Which Doctor to Trust: A Recommender System for Identifying the Right Doctors
The recommender system is the core component of the social network named HealthNet (HN). The recommendation algorithm first computes similarities among patients, and then generates a ranked list of doctors and hospitals suitable for a given patient profile, by exploiting health data shared by the community. Accordingly, the HN user can find her most similar patients, look how they cured their diseases, and receive suggestions for solving her problem.
HN is implemented as a standard social network where users are patients
The first interaction with the system is the registration step.
Then, the patient can enter personal health data
: conditions, treatments (e.g., drugs, dosages, side effects, surgeries), health indicators (e.g., blood pressure, body weight, laboratory analysis, etc.), consulted doctors, hospitalizations.
In this way, HN centralizes individual health data and allows a simple and organized access
to them.
The Recommender System is the core component of HN.
It exploits patient profiles for suggesting other similar
patients, doctors,hospitals (the list of suggested, patients, doctors and hospitals can be further filtered by position and disease).
The similarity between two patients semantic matching
between the conditions exploits the HN disease hierarchy.
More formally, the similarity score between two
patients is computed as follows:
$$s(p, p^{\prime}) =
\alpha\frac{\sum_{i=1}^{k}\sum_{j=1}^{n}s_c(p_{c_i}, p^{\prime}_{c_j})}{kn}\
- (1-\alpha)\frac{\sum_{i=1}^{z}\sum_{j=1}^{r}s_t(p_{t_i}, p^{\prime}{t_j})}{zr}
where $k$ (respectively $n$) is the number of conditions $p$ (respectively $p^{\prime}$) is affected by,
$p_c$ is a condition of the patient $p$,
$z$ (respectively $r$) is the number of treatments for $p$ (respectively $p^{\prime}$),
$p_t$ is a treatment for the patient $p$.
They are computed as follows:
$$s_c(p{c_i}, p^{\prime}{c_j}) =
\log\frac{p{c_i}}{p^{\prime}{c_j}}, &\text{if $c_i=c_j$}\
\frac{1}{sp(c_i, c_j)}, &\text{otherwise}
s_t(p{t_i}, p^{\prime}_{t_j}) =
1, &\text{if
$t_i=t_j$ }\ 0, &\text{otherwise} \end{cases}. $$
There are different perspectives of patient-doctor matchmaking system:
- From patients’ perspectives, such systems should provide
recommendations and safeguard against poor recommendations in order to be trustworthy. - From the perspective of healthcare professionals, these systems need to provide suitable recommendations based on their
domain knowledge and experience
. - More generally, insurance companies and healthcare institutes are interested in improving recommendation rates through research and reaping the potential benefits of these recommendation systems.
The features include demographic data, behevioral data, ICD-9, interaction, the number of visits to the doctor.
A Hybrid Recommender System for Patient-Doctor Matchmaking in Primary Care perform hybrid matrix factorization (MF)
and recommend each patient a list of family doctors according to the level of information available about them.
We achieve this by learning latent representations
for patients and doctors from their interactions and metadata
Given the different level of information available to us about different patients, five use cases are proposed to make doctor recommendations in different scenarios.
The patient-doctor interaction matrix
MF learns $\mathbf{p}i$ and $\mathbf{q}j$, such that the predicted score for
unobserved entries $\hat{y}{ij}$ is given by the inner product
of latent
patient and doctor representations:
$$\hat{y}{ij}=g(i,j\mid \mathbf{p}_i, \mathbf{q}_j)=g(\mathbf{p}_i\cdot \mathbf{q}_j)=\frac{1}{1+\exp(\left<\mathbf{p}_i,\mathbf{q}_j)\right>}.$$
Then formulate a learning-to-rank task by using Weighted Approximate-Rank Pairwise (WARP) loss
For each observed interaction $\hat{y}{ij}$, WARP samples a negative doctor $d$ and computes the difference between predicted $\hat{y}{ij}$ and
We can model the trust
- https://www.aau.at/en/ainf/research-groups/infsys/team/dietmar-jannach/
- https://xamat.github.io/
- http://presnick.people.si.umich.edu/
- https://www.stern.nyu.edu/faculty/bio/alexander-tuzhilin
- http://people.stern.nyu.edu/atuzhili/
- https://www.researchgate.net/profile/Markus_Zanker
- https://cseweb.ucsd.edu/~jmcauley/datasets.html
- Deep Learning based Recommender System: A Survey and New Perspectives
Roughly speaking, preference learning is about methods for learning preference models from explicit or implicit preference information, typically used for predicting the preferences of an individual or a group of individuals. Approaches relevant to this area range from learning special types of preference models, such as lexicographic orders, over "learning to rank" for information retrieval to collaborative filtering techniques for recommender systems.
- Fast Active Exploration for Link-Based Preference Learning using Gaussian Processes
Preference learning has been studied for several decades and has drawn increasing attention in recent years due to its importance in web applications, such as ad serving, search, and electronic commerce. In all of these applications, we observe (often discrete) choices that reflect relative preferences among several items, e.g. products, songs, web pages or documents. Moreover, observations are in many cases censored. Hence, the goal is to reconstruct the overall model of preferences by, for example, learning a general ordering function based on the partially observed decisions. Choice models try to predict the specific choices individuals (or groups of individuals) make when offered a possibly very large number of alternatives. Traditionally, they are concerned with the decision process of individuals and have been studied independently in machine learning, data and web mining, econometrics, and psychology. However, these diverse communities have had few interactions in the past. One goal of this workshop is to foster interdisciplinary exchange, by encouraging abstraction of the underlying problem (and solution) characteristics.
Advertising, recommendation and search
is 3 fundation stone of e-economics.
Online advertising has grown over the past decade to over 26 billion dollars in recorded revenue in 2010. The revenues generated are based on different pricing models that can be fundamentally grouped into two types: cost per (thousand) impressions (CPM) and cost per action (CPA), where an action can be a click, signing up with the advertiser, a sale, or any other measurable outcome. A web publisher generating revenues by selling advertising space on its site can offer either a CPM or CPA contract. We analyze the conditions under which the two parties agree on each contract type, accounting for the relative risk experienced by each party.
The information technology industry relies heavily on the on-line advertising such as [Google,Facebook or Alibaba]. Advertising is nothing except information, which is not usually accepted gladly. In fact, it is more difficult than recommendation because it is less known of the context where the advertisement is placed.
Hongliang Jie shares 3 challenges of computational advertising in Etsy, which will be the titles of the following subsections.
When the feature vector
Practical Lessons from Predicting Clicks on Ads at Facebook or the blog
use the GBRT to select proper features and LR to map these features into the interval
- Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction
