VADER Sentiment Analysis VADER 情感分析

Released已發布

algorithm algorithm

Implement VADER sentiment analysis for social media text scoring. Use this skill when the user needs to analyze sentiment in tweets, reviews, or social posts, compute compound sentiment scores, or classify text polarity — even if they say 'is this positive or negative', 'sentiment of these comments', or 'social media mood analysis'.

演算法技能：VADER Sentiment Analysis 分析與應用。

View on GitHub在 GitHub 查看

Overview概述

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment tool optimized for social media. Returns compound score [-1, +1] combining positive, negative, and neutral proportions. Runs in O(n) per text where n = word count. No training required.

When to Use使用時機

Trigger conditions:

Analyzing sentiment in social media posts, tweets, or reviews
Quick sentiment scoring without ML model training
Processing text with slang, emoticons, and informal language

When NOT to use:

For formal/academic text (VADER is tuned for social media)
When domain-specific sentiment matters (e.g., financial sentiment — use FinBERT)
When sarcasm detection is critical (VADER doesn't detect sarcasm)

Algorithm演算法

IRON LAW: VADER Is Designed for SOCIAL MEDIA Text
It handles slang, emoticons, capitalization, and punctuation as
sentiment modifiers. Applying it to formal documents (legal, academic,
medical) produces unreliable scores. For domain-specific text, use
domain-trained models instead.

Phase 1: Input Validation

Tokenize text. Preserve: capitalization (ALL CAPS = emphasis), punctuation (! amplifies), emoticons/emoji. Gate: Text is non-empty, encoding handled correctly.

Phase 2: Core Algorithm

Look up each token in VADER lexicon (7,500+ sentiment-rated terms)
Apply grammatical rules: negation ("not good" = negative), degree modifiers ("very good" > "good"), capitalization boost, punctuation amplification
Compute raw valence scores for positive, negative, neutral proportions
Compute compound score: normalized sum of all valence scores using formula: compound = sum / √(sum² + α) where α = 15

Phase 3: Verification

Classify: compound ≥ 0.05 → positive, ≤ -0.05 → negative, else neutral. Spot-check sample results. Gate: Classifications pass manual spot-check on 10-20 examples.

Phase 4: Output

Return compound score and polarity classification per text.

Output Format輸出格式

{
  "results": [{"text": "...", "compound": 0.76, "pos": 0.45, "neu": 0.55, "neg": 0.0, "label": "positive"}],
  "metadata": {"texts_analyzed": 500, "distribution": {"positive": 0.45, "neutral": 0.35, "negative": 0.20}}
}

Examples範例

Sample I/O

Input: "This product is AMAZING!!! 😍" Expected: compound ≈ 0.87 (positive). Boosted by: CAPS, !!!, 😍 emoji.

Edge Cases

Input	Expected	Why
"Not bad at all"	Slightly positive (~0.2)	Double negation partially handled
"😂😂😂"	Positive	Emoji mapped in lexicon
Empty string	Compound = 0, neutral	No tokens to score

Gotchas注意事項

Sarcasm is invisible: "Oh great, another meeting" reads as positive. VADER has no sarcasm detection.
Negation window: VADER applies negation within a 3-word window. "I do not think this is bad" may misparse the negation chain.
Emoji coverage: VADER's emoji lexicon may not cover newer emoji. Update or supplement as needed.
Language limitation: VADER is English-only. For Chinese/Japanese, use language-specific tools (e.g., SnowNLP for Chinese).
Compound threshold sensitivity: The 0.05 boundary is arbitrary. Adjust thresholds based on your specific use case and tolerance for false positives.

References參考資料

For VADER lexicon and rules documentation, see references/vader-rules.md
For comparison with transformer-based sentiment models, see references/model-comparison.md

Tags標籤

social-mediasentiment-analysisvadernlp

Part of所屬

Brand Media & PR 品牌媒體與公關

Brand monitoring and public relations bundle combining Taiwan social media tracking, news monitoring, and advertising analytics for comprehensive brand management.

品牌監測與公關組合包，結合台灣社群媒體追蹤、新聞監測與廣告分析，實現全面品牌管理。