Skip to content

推荐系统

数据模型设计

1. 核心实体

  • 用户(User):使用推荐系统的用户
  • 物品(Item):被推荐的物品,如电影、商品、文章等
  • 评分(Rating):用户对物品的评分
  • 标签(Tag):物品的标签
  • 类别(Category):物品的类别
  • 行为(Action):用户对物品的行为,如浏览、点击、购买等

2. 关系类型

  • 评分(RATED):用户对物品的评分
  • 购买(PURCHASED):用户购买了物品
  • 浏览(VIEWED):用户浏览了物品
  • 点击(CLICKED):用户点击了物品
  • 收藏(SAVED):用户收藏了物品
  • 属于(BELONGS_TO):物品属于某个类别
  • 有标签(HAS_TAG):物品有某个标签
  • 相似(SIMILAR_TO):物品之间的相似关系

3. 属性设计

  • User 节点属性

    • id: 唯一标识符
    • name: 用户名
    • age: 年龄
    • gender: 性别
    • location: 位置
    • registration_date: 注册日期
  • Item 节点属性

    • id: 唯一标识符
    • name: 物品名称
    • description: 描述
    • price: 价格
    • release_date: 发布日期
    • average_rating: 平均评分
  • Rating 关系属性

    • score: 评分值
    • timestamp: 评分时间
  • Action 关系属性

    • timestamp: 行为时间
    • duration: 行为持续时间

4. 数据模型示例

cypher
// 用户节点
(:User {id: 1, name: 'John', age: 30, gender: 'Male', location: 'New York'})

// 物品节点
(:Item {id: 1, name: 'The Matrix', description: 'A sci-fi action film', price: 19.99, release_date: '1999-03-31', average_rating: 4.5})

// 类别节点
(:Category {id: 1, name: 'Sci-Fi'})

// 标签节点
(:Tag {id: 1, name: 'Action'})

// 关系
(:User {id: 1})-[:RATED {score: 5, timestamp: '2023-01-01'}]->(:Item {id: 1})
(:User {id: 1})-[:PURCHASED {timestamp: '2023-01-01'}]->(:Item {id: 1})
(:Item {id: 1})-[:BELONGS_TO]->(:Category {id: 1})
(:Item {id: 1})-[:HAS_TAG]->(:Tag {id: 1})
(:Item {id: 1})-[:SIMILAR_TO {score: 0.9}]->(:Item {id: 2})

协同过滤

1. 基于用户的协同过滤

cypher
// 查找与目标用户有相似评分行为的用户
MATCH (u:User {id: 1})-[:RATED]->(i:Item)<-[:RATED]-(other:User)
WHERE u <> other
WITH other, count(i) AS common_items, sum(abs(u.rating - other.rating)) AS rating_difference
WHERE common_items > 3
WITH other, 1.0 / (1.0 + rating_difference) AS similarity
ORDER BY similarity DESC
LIMIT 10

// 查找相似用户喜欢但目标用户未评分的物品
MATCH (other)-[:RATED {score: >4}]->(recommended:Item)
WHERE NOT (u)-[:RATED]->(recommended)
RETURN recommended.name, count(*) AS recommendation_count
ORDER BY recommendation_count DESC
LIMIT 10

2. 基于物品的协同过滤

cypher
// 查找与目标用户喜欢的物品相似的物品
MATCH (u:User {id: 1})-[:RATED {score: >4}]->(i:Item)-[:SIMILAR_TO]->(similar:Item)
WHERE NOT (u)-[:RATED]->(similar)
RETURN similar.name, avg(i.similarity) AS similarity_score
ORDER BY similarity_score DESC
LIMIT 10

3. 矩阵分解

cypher
// 使用 Neo4j Graph Data Science 库进行矩阵分解
CALL gds.graph.create('ratingGraph', ['User', 'Item'], {
  RATED: {
    type: 'RATED',
    properties: 'score',
    orientation: 'UNDIRECTED'
  }
})

CALL gds.alpha.linkprediction.adamicAdar.stream('ratingGraph', {
  sourceNodeFilter: 'User',
  targetNodeFilter: 'Item',
  relationshipTypes: ['RATED']
})
YIELD sourceNodeId, targetNodeId, score
MATCH (u:User) WHERE id(u) = sourceNodeId
MATCH (i:Item) WHERE id(i) = targetNodeId
WHERE NOT (u)-[:RATED]->(i)
RETURN u.name, i.name, score
ORDER BY score DESC
LIMIT 10

CALL gds.graph.drop('ratingGraph')

基于路径的推荐

1. 路径查找

cypher
// 查找用户之间的路径
MATCH path = (u:User {id: 1})-[:RATED]->(i:Item)<-[:RATED]-(other:User)-[:RATED]->(recommended:Item)
WHERE NOT (u)-[:RATED]->(recommended)
RETURN recommended.name, count(path) AS path_count
ORDER BY path_count DESC
LIMIT 10

2. 加权路径

cypher
// 基于加权路径的推荐
MATCH path = (u:User {id: 1})-[:RATED {score: >3}]->(i:Item)<-[:RATED {score: >3}]-(other:User)-[:RATED {score: >3}]->(recommended:Item)
WHERE NOT (u)-[:RATED]->(recommended)
WITH recommended, sum(1.0 / length(path)) AS score
RETURN recommended.name, score
ORDER BY score DESC
LIMIT 10

3. 多步路径

cypher
// 多步路径推荐
MATCH path = (u:User {id: 1})-[*2..3]->(recommended:Item)
WHERE NOT (u)-[:RATED]->(recommended)
  AND all(rel IN relationships(path) WHERE type(rel) IN ['RATED', 'PURCHASED', 'VIEWED'])
RETURN recommended.name, count(path) AS path_count
ORDER BY path_count DESC
LIMIT 10

实时推荐

1. 基于最近行为

cypher
// 基于用户最近的浏览行为推荐
MATCH (u:User {id: 1})-[:VIEWED]->(i:Item)
WITH u, i
ORDER BY i.timestamp DESC
LIMIT 5

MATCH (i)-[:SIMILAR_TO]->(recommended:Item)
WHERE NOT (u)-[:VIEWED]->(recommended)
  AND NOT (u)-[:PURCHASED]->(recommended)
RETURN recommended.name, avg(i.similarity) AS similarity_score
ORDER BY similarity_score DESC
LIMIT 10

2. 基于上下文

cypher
// 基于时间上下文推荐
MATCH (u:User {id: 1})
WITH u, hour(datetime()) AS current_hour

// 根据时间段推荐不同类型的内容
MATCH (c:Category)
WHERE (current_hour >= 6 AND current_hour < 12 AND c.name = 'Breakfast') OR
      (current_hour >= 12 AND current_hour < 18 AND c.name = 'Lunch') OR
      (current_hour >= 18 AND current_hour < 24 AND c.name = 'Dinner') OR
      (current_hour >= 0 AND current_hour < 6 AND c.name = 'Snack')

MATCH (c)<-[:BELONGS_TO]-(recommended:Item)
WHERE NOT (u)-[:PURCHASED]->(recommended)
RETURN recommended.name, c.name AS category
ORDER BY recommended.average_rating DESC
LIMIT 10

3. 增量更新

cypher
// 增量更新推荐
// 当用户产生新行为时,更新推荐
MATCH (u:User {id: 1})-[:RATED {score: 5}]->(i:Item)

// 更新物品之间的相似性
MATCH (i)-[:SIMILAR_TO]->(other:Item)
SET other.similarity = other.similarity + 0.1

// 更新用户的推荐列表
MATCH (u)-[:RATED]->(i:Item)-[:SIMILAR_TO]->(recommended:Item)
WHERE NOT (u)-[:RATED]->(recommended)
WITH recommended, count(*) AS score
ORDER BY score DESC
LIMIT 10
RETURN recommended.name, score

混合推荐策略

1. 加权混合

cypher
// 基于用户的协同过滤
MATCH (u:User {id: 1})-[:RATED]->(i:Item)<-[:RATED]-(other:User)
WHERE u <> other
WITH other, count(i) AS common_items
WHERE common_items > 3
MATCH (other)-[:RATED {score: >4}]->(rec1:Item)
WHERE NOT (u)-[:RATED]->(rec1)
WITH u, collect(DISTINCT rec1) AS user_based_recs

// 基于物品的协同过滤
MATCH (u)-[:RATED {score: >4}]->(i:Item)-[:SIMILAR_TO]->(rec2:Item)
WHERE NOT (u)-[:RATED]->(rec2)
WITH u, user_based_recs, collect(DISTINCT rec2) AS item_based_recs

// 基于内容的推荐
MATCH (u)-[:RATED {score: >4}]->(i:Item)-[:HAS_TAG]->(t:Tag)<-[:HAS_TAG]-(rec3:Item)
WHERE NOT (u)-[:RATED]->(rec3)
WITH u, user_based_recs, item_based_recs, collect(DISTINCT rec3) AS content_based_recs

// 合并推荐结果
WITH u, 
     [rec IN user_based_recs | {item: rec, score: 0.4}] +
     [rec IN item_based_recs | {item: rec, score: 0.4}] +
     [rec IN content_based_recs | {item: rec, score: 0.2}] AS all_recs

// 去重并排序
UNWIND all_recs AS rec
WITH rec.item AS item, sum(rec.score) AS total_score
ORDER BY total_score DESC
LIMIT 10
RETURN item.name, total_score

2. 特征融合

cypher
// 提取用户特征
MATCH (u:User {id: 1})
WITH u, u.age AS age, u.gender AS gender, u.location AS location

// 提取物品特征
MATCH (i:Item)
WHERE NOT (u)-[:RATED]->(i)
WITH u, i, age, gender, location,
     [(i)-[:BELONGS_TO]->(c:Category) | c.name] AS categories,
     [(i)-[:HAS_TAG]->(t:Tag) | t.name] AS tags

// 计算特征相似度
MATCH (u)-[:RATED {score: >4}]->(liked:Item)
WITH u, i, age, gender, location, categories, tags,
     [(liked)-[:BELONGS_TO]->(c:Category) | c.name] AS liked_categories,
     [(liked)-[:HAS_TAG]->(t:Tag) | t.name] AS liked_tags

// 计算类别相似度
WITH u, i, age, gender, location, categories, tags,
     size([c IN categories WHERE c IN liked_categories]) AS category_similarity,
     size([t IN tags WHERE t IN liked_tags]) AS tag_similarity

// 计算总相似度
WITH u, i, (category_similarity + tag_similarity) AS total_similarity
ORDER BY total_similarity DESC
LIMIT 10
RETURN i.name, total_similarity

3. 集成学习

cypher
// 使用多种推荐算法,然后集成结果

// 算法 1: 协同过滤
MATCH (u:User {id: 1})-[:RATED]->(i:Item)<-[:RATED]-(other:User)-[:RATED {score: >4}]->(rec1:Item)
WHERE NOT (u)-[:RATED]->(rec1)
WITH u, collect(DISTINCT rec1) AS cf_recs

// 算法 2: 内容过滤
MATCH (u)-[:RATED {score: >4}]->(i:Item)-[:HAS_TAG]->(t:Tag)<-[:HAS_TAG]-(rec2:Item)
WHERE NOT (u)-[:RATED]->(rec2)
WITH u, cf_recs, collect(DISTINCT rec2) AS content_recs

// 算法 3: 流行度
MATCH (rec3:Item)
WHERE NOT (u)-[:RATED]->(rec3)
WITH u, cf_recs, content_recs, rec3
ORDER BY rec3.average_rating DESC
LIMIT 10
WITH u, cf_recs, content_recs, collect(rec3) AS popular_recs

// 集成结果
WITH u, 
     [rec IN cf_recs | {item: rec, score: 0.5}] +
     [rec IN content_recs | {item: rec, score: 0.3}] +
     [rec IN popular_recs | {item: rec, score: 0.2}] AS all_recs

// 去重并排序
UNWIND all_recs AS rec
WITH rec.item AS item, sum(rec.score) AS total_score
ORDER BY total_score DESC
LIMIT 10
RETURN item.name, total_score

推荐系统评估

1. 评估指标

  • 准确率(Precision):推荐列表中相关物品的比例
  • 召回率(Recall):相关物品中被推荐的比例
  • F1 分数:准确率和召回率的调和平均
  • 平均准确率(MAP):平均准确率
  • NDCG:归一化折扣累积增益

2. 评估方法

  • 离线评估:使用历史数据评估
  • 在线评估:A/B 测试
  • 用户反馈:收集用户反馈

3. 评估示例

cypher
// 离线评估
// 分割数据集为训练集和测试集
MATCH (u:User)-[r:RATED]->(i:Item)
WITH u, collect({item: i, rating: r.score}) AS ratings
WHERE size(ratings) > 5
// 随机选择 80% 作为训练集,20% 作为测试集
WITH u, ratings, 
     [r IN ratings WHERE rand() < 0.8] AS train_ratings,
     [r IN ratings WHERE rand() >= 0.8] AS test_ratings

// 训练推荐模型
// ...

// 生成推荐
// ...

// 评估推荐结果
WITH u, test_ratings, recommended_items
WITH u, 
     [t IN test_ratings WHERE t.rating >= 4] AS relevant_items,
     recommended_items
WITH u, 
     size([r IN recommended_items WHERE r IN [t.item FOR t IN relevant_items]]) AS true_positives,
     size(recommended_items) AS total_recommended,
     size(relevant_items) AS total_relevant

// 计算准确率和召回率
WITH u, 
     true_positives * 1.0 / total_recommended AS precision,
     true_positives * 1.0 / total_relevant AS recall

// 计算平均准确率和召回率
RETURN avg(precision) AS avg_precision, avg(recall) AS avg_recall

案例应用

1. 电影推荐系统

  • 功能

    • 用户管理:用户信息、观影历史
    • 电影管理:电影信息、评分
    • 推荐功能:个性化推荐、相似电影推荐
    • 社交功能:好友推荐、观影分享
  • 技术栈

    • 前端:React/Vue.js
    • 后端:Node.js/Express
    • 数据库:Neo4j
    • 缓存:Redis
    • 搜索引擎:Elasticsearch

2. 电商推荐系统

  • 功能

    • 用户管理:用户信息、购买历史
    • 商品管理:商品信息、分类
    • 推荐功能:个性化推荐、相关商品推荐
    • 购物车管理:购物车功能、结算
  • 技术栈

    • 前端:React
    • 后端:Spring Boot
    • 数据库:Neo4j
    • 缓存:Redis
    • 消息队列:Kafka

小结

推荐系统是 Neo4j 的重要应用场景之一,通过图数据库的强大能力,可以实现协同过滤、基于路径的推荐、实时推荐和混合推荐策略等功能。本文介绍了推荐系统的数据模型设计、各种推荐算法和评估方法,以及案例应用。在实际应用中,需要根据具体的业务需求和数据特点,选择合适的推荐算法和技术方案,构建高效、准确的推荐系统。