Appearance
推荐系统
数据模型设计
1. 核心实体
- 用户(User):使用推荐系统的用户
- 物品(Item):被推荐的物品,如电影、商品、文章等
- 评分(Rating):用户对物品的评分
- 标签(Tag):物品的标签
- 类别(Category):物品的类别
- 行为(Action):用户对物品的行为,如浏览、点击、购买等
2. 关系类型
- 评分(RATED):用户对物品的评分
- 购买(PURCHASED):用户购买了物品
- 浏览(VIEWED):用户浏览了物品
- 点击(CLICKED):用户点击了物品
- 收藏(SAVED):用户收藏了物品
- 属于(BELONGS_TO):物品属于某个类别
- 有标签(HAS_TAG):物品有某个标签
- 相似(SIMILAR_TO):物品之间的相似关系
3. 属性设计
User 节点属性:
- id: 唯一标识符
- name: 用户名
- age: 年龄
- gender: 性别
- location: 位置
- registration_date: 注册日期
Item 节点属性:
- id: 唯一标识符
- name: 物品名称
- description: 描述
- price: 价格
- release_date: 发布日期
- average_rating: 平均评分
Rating 关系属性:
- score: 评分值
- timestamp: 评分时间
Action 关系属性:
- timestamp: 行为时间
- duration: 行为持续时间
4. 数据模型示例
cypher
// 用户节点
(:User {id: 1, name: 'John', age: 30, gender: 'Male', location: 'New York'})
// 物品节点
(:Item {id: 1, name: 'The Matrix', description: 'A sci-fi action film', price: 19.99, release_date: '1999-03-31', average_rating: 4.5})
// 类别节点
(:Category {id: 1, name: 'Sci-Fi'})
// 标签节点
(:Tag {id: 1, name: 'Action'})
// 关系
(:User {id: 1})-[:RATED {score: 5, timestamp: '2023-01-01'}]->(:Item {id: 1})
(:User {id: 1})-[:PURCHASED {timestamp: '2023-01-01'}]->(:Item {id: 1})
(:Item {id: 1})-[:BELONGS_TO]->(:Category {id: 1})
(:Item {id: 1})-[:HAS_TAG]->(:Tag {id: 1})
(:Item {id: 1})-[:SIMILAR_TO {score: 0.9}]->(:Item {id: 2})协同过滤
1. 基于用户的协同过滤
cypher
// 查找与目标用户有相似评分行为的用户
MATCH (u:User {id: 1})-[:RATED]->(i:Item)<-[:RATED]-(other:User)
WHERE u <> other
WITH other, count(i) AS common_items, sum(abs(u.rating - other.rating)) AS rating_difference
WHERE common_items > 3
WITH other, 1.0 / (1.0 + rating_difference) AS similarity
ORDER BY similarity DESC
LIMIT 10
// 查找相似用户喜欢但目标用户未评分的物品
MATCH (other)-[:RATED {score: >4}]->(recommended:Item)
WHERE NOT (u)-[:RATED]->(recommended)
RETURN recommended.name, count(*) AS recommendation_count
ORDER BY recommendation_count DESC
LIMIT 102. 基于物品的协同过滤
cypher
// 查找与目标用户喜欢的物品相似的物品
MATCH (u:User {id: 1})-[:RATED {score: >4}]->(i:Item)-[:SIMILAR_TO]->(similar:Item)
WHERE NOT (u)-[:RATED]->(similar)
RETURN similar.name, avg(i.similarity) AS similarity_score
ORDER BY similarity_score DESC
LIMIT 103. 矩阵分解
cypher
// 使用 Neo4j Graph Data Science 库进行矩阵分解
CALL gds.graph.create('ratingGraph', ['User', 'Item'], {
RATED: {
type: 'RATED',
properties: 'score',
orientation: 'UNDIRECTED'
}
})
CALL gds.alpha.linkprediction.adamicAdar.stream('ratingGraph', {
sourceNodeFilter: 'User',
targetNodeFilter: 'Item',
relationshipTypes: ['RATED']
})
YIELD sourceNodeId, targetNodeId, score
MATCH (u:User) WHERE id(u) = sourceNodeId
MATCH (i:Item) WHERE id(i) = targetNodeId
WHERE NOT (u)-[:RATED]->(i)
RETURN u.name, i.name, score
ORDER BY score DESC
LIMIT 10
CALL gds.graph.drop('ratingGraph')基于路径的推荐
1. 路径查找
cypher
// 查找用户之间的路径
MATCH path = (u:User {id: 1})-[:RATED]->(i:Item)<-[:RATED]-(other:User)-[:RATED]->(recommended:Item)
WHERE NOT (u)-[:RATED]->(recommended)
RETURN recommended.name, count(path) AS path_count
ORDER BY path_count DESC
LIMIT 102. 加权路径
cypher
// 基于加权路径的推荐
MATCH path = (u:User {id: 1})-[:RATED {score: >3}]->(i:Item)<-[:RATED {score: >3}]-(other:User)-[:RATED {score: >3}]->(recommended:Item)
WHERE NOT (u)-[:RATED]->(recommended)
WITH recommended, sum(1.0 / length(path)) AS score
RETURN recommended.name, score
ORDER BY score DESC
LIMIT 103. 多步路径
cypher
// 多步路径推荐
MATCH path = (u:User {id: 1})-[*2..3]->(recommended:Item)
WHERE NOT (u)-[:RATED]->(recommended)
AND all(rel IN relationships(path) WHERE type(rel) IN ['RATED', 'PURCHASED', 'VIEWED'])
RETURN recommended.name, count(path) AS path_count
ORDER BY path_count DESC
LIMIT 10实时推荐
1. 基于最近行为
cypher
// 基于用户最近的浏览行为推荐
MATCH (u:User {id: 1})-[:VIEWED]->(i:Item)
WITH u, i
ORDER BY i.timestamp DESC
LIMIT 5
MATCH (i)-[:SIMILAR_TO]->(recommended:Item)
WHERE NOT (u)-[:VIEWED]->(recommended)
AND NOT (u)-[:PURCHASED]->(recommended)
RETURN recommended.name, avg(i.similarity) AS similarity_score
ORDER BY similarity_score DESC
LIMIT 102. 基于上下文
cypher
// 基于时间上下文推荐
MATCH (u:User {id: 1})
WITH u, hour(datetime()) AS current_hour
// 根据时间段推荐不同类型的内容
MATCH (c:Category)
WHERE (current_hour >= 6 AND current_hour < 12 AND c.name = 'Breakfast') OR
(current_hour >= 12 AND current_hour < 18 AND c.name = 'Lunch') OR
(current_hour >= 18 AND current_hour < 24 AND c.name = 'Dinner') OR
(current_hour >= 0 AND current_hour < 6 AND c.name = 'Snack')
MATCH (c)<-[:BELONGS_TO]-(recommended:Item)
WHERE NOT (u)-[:PURCHASED]->(recommended)
RETURN recommended.name, c.name AS category
ORDER BY recommended.average_rating DESC
LIMIT 103. 增量更新
cypher
// 增量更新推荐
// 当用户产生新行为时,更新推荐
MATCH (u:User {id: 1})-[:RATED {score: 5}]->(i:Item)
// 更新物品之间的相似性
MATCH (i)-[:SIMILAR_TO]->(other:Item)
SET other.similarity = other.similarity + 0.1
// 更新用户的推荐列表
MATCH (u)-[:RATED]->(i:Item)-[:SIMILAR_TO]->(recommended:Item)
WHERE NOT (u)-[:RATED]->(recommended)
WITH recommended, count(*) AS score
ORDER BY score DESC
LIMIT 10
RETURN recommended.name, score混合推荐策略
1. 加权混合
cypher
// 基于用户的协同过滤
MATCH (u:User {id: 1})-[:RATED]->(i:Item)<-[:RATED]-(other:User)
WHERE u <> other
WITH other, count(i) AS common_items
WHERE common_items > 3
MATCH (other)-[:RATED {score: >4}]->(rec1:Item)
WHERE NOT (u)-[:RATED]->(rec1)
WITH u, collect(DISTINCT rec1) AS user_based_recs
// 基于物品的协同过滤
MATCH (u)-[:RATED {score: >4}]->(i:Item)-[:SIMILAR_TO]->(rec2:Item)
WHERE NOT (u)-[:RATED]->(rec2)
WITH u, user_based_recs, collect(DISTINCT rec2) AS item_based_recs
// 基于内容的推荐
MATCH (u)-[:RATED {score: >4}]->(i:Item)-[:HAS_TAG]->(t:Tag)<-[:HAS_TAG]-(rec3:Item)
WHERE NOT (u)-[:RATED]->(rec3)
WITH u, user_based_recs, item_based_recs, collect(DISTINCT rec3) AS content_based_recs
// 合并推荐结果
WITH u,
[rec IN user_based_recs | {item: rec, score: 0.4}] +
[rec IN item_based_recs | {item: rec, score: 0.4}] +
[rec IN content_based_recs | {item: rec, score: 0.2}] AS all_recs
// 去重并排序
UNWIND all_recs AS rec
WITH rec.item AS item, sum(rec.score) AS total_score
ORDER BY total_score DESC
LIMIT 10
RETURN item.name, total_score2. 特征融合
cypher
// 提取用户特征
MATCH (u:User {id: 1})
WITH u, u.age AS age, u.gender AS gender, u.location AS location
// 提取物品特征
MATCH (i:Item)
WHERE NOT (u)-[:RATED]->(i)
WITH u, i, age, gender, location,
[(i)-[:BELONGS_TO]->(c:Category) | c.name] AS categories,
[(i)-[:HAS_TAG]->(t:Tag) | t.name] AS tags
// 计算特征相似度
MATCH (u)-[:RATED {score: >4}]->(liked:Item)
WITH u, i, age, gender, location, categories, tags,
[(liked)-[:BELONGS_TO]->(c:Category) | c.name] AS liked_categories,
[(liked)-[:HAS_TAG]->(t:Tag) | t.name] AS liked_tags
// 计算类别相似度
WITH u, i, age, gender, location, categories, tags,
size([c IN categories WHERE c IN liked_categories]) AS category_similarity,
size([t IN tags WHERE t IN liked_tags]) AS tag_similarity
// 计算总相似度
WITH u, i, (category_similarity + tag_similarity) AS total_similarity
ORDER BY total_similarity DESC
LIMIT 10
RETURN i.name, total_similarity3. 集成学习
cypher
// 使用多种推荐算法,然后集成结果
// 算法 1: 协同过滤
MATCH (u:User {id: 1})-[:RATED]->(i:Item)<-[:RATED]-(other:User)-[:RATED {score: >4}]->(rec1:Item)
WHERE NOT (u)-[:RATED]->(rec1)
WITH u, collect(DISTINCT rec1) AS cf_recs
// 算法 2: 内容过滤
MATCH (u)-[:RATED {score: >4}]->(i:Item)-[:HAS_TAG]->(t:Tag)<-[:HAS_TAG]-(rec2:Item)
WHERE NOT (u)-[:RATED]->(rec2)
WITH u, cf_recs, collect(DISTINCT rec2) AS content_recs
// 算法 3: 流行度
MATCH (rec3:Item)
WHERE NOT (u)-[:RATED]->(rec3)
WITH u, cf_recs, content_recs, rec3
ORDER BY rec3.average_rating DESC
LIMIT 10
WITH u, cf_recs, content_recs, collect(rec3) AS popular_recs
// 集成结果
WITH u,
[rec IN cf_recs | {item: rec, score: 0.5}] +
[rec IN content_recs | {item: rec, score: 0.3}] +
[rec IN popular_recs | {item: rec, score: 0.2}] AS all_recs
// 去重并排序
UNWIND all_recs AS rec
WITH rec.item AS item, sum(rec.score) AS total_score
ORDER BY total_score DESC
LIMIT 10
RETURN item.name, total_score推荐系统评估
1. 评估指标
- 准确率(Precision):推荐列表中相关物品的比例
- 召回率(Recall):相关物品中被推荐的比例
- F1 分数:准确率和召回率的调和平均
- 平均准确率(MAP):平均准确率
- NDCG:归一化折扣累积增益
2. 评估方法
- 离线评估:使用历史数据评估
- 在线评估:A/B 测试
- 用户反馈:收集用户反馈
3. 评估示例
cypher
// 离线评估
// 分割数据集为训练集和测试集
MATCH (u:User)-[r:RATED]->(i:Item)
WITH u, collect({item: i, rating: r.score}) AS ratings
WHERE size(ratings) > 5
// 随机选择 80% 作为训练集,20% 作为测试集
WITH u, ratings,
[r IN ratings WHERE rand() < 0.8] AS train_ratings,
[r IN ratings WHERE rand() >= 0.8] AS test_ratings
// 训练推荐模型
// ...
// 生成推荐
// ...
// 评估推荐结果
WITH u, test_ratings, recommended_items
WITH u,
[t IN test_ratings WHERE t.rating >= 4] AS relevant_items,
recommended_items
WITH u,
size([r IN recommended_items WHERE r IN [t.item FOR t IN relevant_items]]) AS true_positives,
size(recommended_items) AS total_recommended,
size(relevant_items) AS total_relevant
// 计算准确率和召回率
WITH u,
true_positives * 1.0 / total_recommended AS precision,
true_positives * 1.0 / total_relevant AS recall
// 计算平均准确率和召回率
RETURN avg(precision) AS avg_precision, avg(recall) AS avg_recall案例应用
1. 电影推荐系统
功能:
- 用户管理:用户信息、观影历史
- 电影管理:电影信息、评分
- 推荐功能:个性化推荐、相似电影推荐
- 社交功能:好友推荐、观影分享
技术栈:
- 前端:React/Vue.js
- 后端:Node.js/Express
- 数据库:Neo4j
- 缓存:Redis
- 搜索引擎:Elasticsearch
2. 电商推荐系统
功能:
- 用户管理:用户信息、购买历史
- 商品管理:商品信息、分类
- 推荐功能:个性化推荐、相关商品推荐
- 购物车管理:购物车功能、结算
技术栈:
- 前端:React
- 后端:Spring Boot
- 数据库:Neo4j
- 缓存:Redis
- 消息队列:Kafka
小结
推荐系统是 Neo4j 的重要应用场景之一,通过图数据库的强大能力,可以实现协同过滤、基于路径的推荐、实时推荐和混合推荐策略等功能。本文介绍了推荐系统的数据模型设计、各种推荐算法和评估方法,以及案例应用。在实际应用中,需要根据具体的业务需求和数据特点,选择合适的推荐算法和技术方案,构建高效、准确的推荐系统。