Content-Based Recommendation 內容式推薦

Released已發布

algorithm algorithm

Implement content-based recommendation by matching item features to user preference profiles. Use this skill when the user needs to recommend items based on attributes, solve the cold start problem for new items, or build recommendations without collaborative data — even if they say 'recommend similar products', 'items like this', or 'feature-based matching'.

演算法技能：Content-Based Recommendation 分析與應用。

View on GitHub在 GitHub 查看

Overview概述

Content-based filtering recommends items whose features match the user's preference profile, built from their interaction history. Computes in O(I × F) per user where I=items, F=features. Solves new-item cold start since items only need features, not interaction history.

When to Use使用時機

Trigger conditions:

Recommending based on item attributes (genre, category, keywords, price range)
New item cold start: items have features but no interaction data yet
When user privacy requires no cross-user data sharing

When NOT to use:

When serendipity matters (content-based creates filter bubbles)
When item features are unavailable or uninformative (use CF instead)

Algorithm演算法

IRON LAW: Content-Based Can Only Recommend SIMILAR Items
It cannot discover unexpected interests (filter bubble problem).
Users who only interact with action movies will only get action
movie recommendations — even if they'd love a documentary.

Phase 1: Input Validation

Extract item feature vectors (TF-IDF for text, one-hot for categories, numerical for attributes). Build user profile from weighted item features of interacted items. Gate: Item features extracted, user profile vector built.

Phase 2: Core Algorithm

Represent each item as a feature vector
Build user profile: weighted centroid of interacted item vectors (weight by recency, rating, or engagement)
Compute similarity between user profile and all candidate items (cosine similarity)
Rank by similarity score, exclude already-interacted items

Phase 3: Verification

Evaluate: does the recommendation list reflect the user's demonstrated preferences? Check diversity metrics. Gate: Recommendations are topically aligned with user history.

Phase 4: Output

Return ranked recommendations with feature-level explanations.

Output Format輸出格式

{
  "recommendations": [{"item_id": "456", "score": 0.87, "matching_features": ["genre:thriller", "director:Nolan"]}],
  "metadata": {"method": "content-based", "features_used": 15, "profile_items": 30}
}

Examples範例

Sample I/O

Input: User watched 5 sci-fi movies, 2 documentaries. Candidate: new sci-fi movie. Expected: High score (~0.8+) due to genre match with dominant preference.

Edge Cases

Input	Expected	Why
New user, no history	Cannot build profile	New-user cold start — use popularity
All items same features	Equal scores	No differentiation possible
User with diverse history	Moderate scores for all	Profile averages dilute signal

Gotchas注意事項

Feature quality is everything: Garbage features → garbage recommendations. Invest in feature engineering.
Filter bubble: Users get increasingly narrow recommendations. Inject diversity by mixing in exploration items.
Profile drift: User preferences change over time. Apply temporal decay to older interactions.
Feature sparsity: Items with few features produce unreliable similarity. Set a minimum feature count threshold.
Over-specialization: A user who rated one jazz album highly shouldn't get ALL jazz. Weight by interaction count, not just rating.

References參考資料

For hybrid approaches combining content and CF, see references/hybrid-strategies.md
For text-based feature extraction techniques, see references/feature-extraction.md

Tags標籤

recommendationcontent-basedfeature-matching