Language modelling

Overview: RICHER + BIGGER DATA -> Opportunities
Applications
Challenges

Properties of Social Media Textual Data Social Media Texts: a Foe of NLP? Standard NLP tools Perform Poorly on Social Media Textual

2.1. POS Tagging

2.2. Lexical Normalization

PRINCIPLE + example An Example of Token-Based Approaches

2.3. Bonus

Feature Representation Using Dependencies? SVM Training Data Generation Detecting Ill-Formed Words Evaluation Results Ex for project: Adapt NLP to data: CMU Twitter POS tagger

2.4. Summary: NLP tools and Noisy Input

• Lexical Normalization • Token based approaches • Distributional similarity approaches • NLP Tools adaptation (POS, NER, Parsing) • CMU ARK TweetNLP http://www.ark.cs.cmu.edu/TweetNLP/ : Twokenizer, POS tagger, TweeboParser • Gate Twitter Pos Tagger https://gate.ac.uk/wiki/twitter-postagger.html

3. Sentiment Analysis

Reminder on text classification and evaluation measures
Sentiment analysis: Introduction
- Negation
- Coreference
- Slang and writing errors
- Comparative
- Domain Dependent Opinion
- Many more challenges
- Subjectivity
- What is an Opinion? (document / sentence level)
  - Mathematical definition
- Entity and Aspect Level
- Structured the Unstructured
- Document Level sentiment analysis
- Aspect Based Sentiment analysis
Affective lexicons = Dictionaries of well-known sentiment words

4. Methods for sentiment analysis

Machine Learning approaches
- ML algorithms + linguistic features
Lexicon based approaches
- Sentiment lexicons (pre-compiled terms)
- Corpus-based
Hybrid approaches: combine both
Most frequent algorithms
- Naïve Bayes
- Maximum Entropy
- Logistic Regression
- SVM
- Recently: Deep Learning: RNN (LSTMs), CNN & word embeddings

5. Methods for ABSA

ABSA: Fine-grained opinion annotation • Determine sentiments about different aspects of entities (e.g. movies, restaurants, cell phones,…) • Aspects are features of an entity (service, food in a restaurant; screen, battery of a cell phone,…)

4. Fake news and stance detection

Learn the representation Word2Vec [Mikolov et al. 2013] Word2Vec Fun with Word Embeddings Drawbacks of word embeddings

2.1. POS Tagging

2.2. Lexical Normalization

2.3. Bonus

2.4. Summary: NLP tools and Noisy Input

3. Sentiment Analysis

4. Methods for sentiment analysis

5. Methods for ABSA

4. Fake news and stance detection

5. Practical Hints

6. Sources

1. Social Media Analytics: Introduction

2. Preprocessing social media texts

2.1. POS Tagging

2.2. Lexical Normalization

2.3. Bonus

2.4. Summary: NLP tools and Noisy Input

3. Sentiment Analysis

4. Methods for sentiment analysis

5. Methods for ABSA

4. Fake news and stance detection

5. Practical Hints

6. Sources