Programmieren lernen
PacmanÜber mich
English
English
  • Course Outline
  • 1 - Databricks
    • Getting Started
    • Working with Notebooks
      • Adding Documentation
      • Built-In Visualizations
      • Import Data
      • Export Data
  • 2 - Introduction
    • Types of Questions
      • Finding Individual Records
      • Summarizing Data
      • Exploring Data
      • Drawing Inferences From Data
      • Predicting Information
      • Finding Causality
    • Steps in Data Analytics
    • Dimensions of Data Sets
    • Dimensions of Records
    • Dimensions of Fields
    • Data Types and Scales
  • 3 - SQL
    • Basic SQL
      • What is SQL?
      • Import Data
      • Select Columns
      • Filter Rows
      • Aggregate and Group Rows
      • Filter Aggregated Rows
      • Sort Rows
    • Advanced SQL
      • Views
      • Set Operators
      • Subqueries
      • Window Functions
      • Date and Time
      • Arrays
      • JSON
      • Statistical Analysis
    • Multiple Data Sets with SQL
    • Text with SQL
      • Search Text
      • Analyzing Words
        • Prefilter the Data
        • Clean and Normalize
        • Tokenize and Count
        • Filter Stop Words
        • POS Tagging
      • Word Pairs
      • Extract Emoticons
  • 4 - Python
    • Python for Data Analytics
      • What is Python?
    • Natural Language Processing
  • 5 - R
    • R Basics
  • 6 - Visualization
    • Why Visualize Data?
    • Data Visualization with R
    • Types of Visualizations
      • Developments and Trends
      • Distributions
    • Pitfalls in Data Visualization
  • 7 - Tableau
    • Getting Data Into Tableau
  • 8 - Spreadsheets
    • What Is A Spreadsheet?
  • Data & Exercises
    • Simpsons
    • Covid19
    • TED Talks
    • Lemonade Market Research
    • Chicago Crimes
    • Tweets of German Politicians
    • Amazon Product Reviews
    • REWE Online Products
Powered by GitBook
On this page

Was this helpful?

  1. 3 - SQL
  2. Text with SQL

Word Pairs

PreviousPOS TaggingNextExtract Emoticons

Last updated 4 years ago

Was this helpful?

Es kann sinnvoll sein, nicht nur einzelne Wörter zu betrachten, sondern auch 2er-Kombinationen von Wörtern zu analysieren. Z.B. um herauszufinden, mit welchen anderen Wörtern das Wort "not" zusammen vorkommt, um eine spezifischere Sentiment-Analyse zu ermöglichen. Das SQL unten erstellt einen View, der für Tweets Wortpaare bildet, die im Tweet direkt aufeinanderfolgen:

create or replace view tweet_word_pairs as
  select t1.id as id
        ,t1.word as left_word
        ,t1.pos as left_pos
        ,t2.word as right_word
        ,t2.pos as right_pos
        ,t1.original_text 
  from tweets_prep_step_4 t1
  inner join tweets_prep_step_4 t2
    on t1.pos = t2.pos - 1
    and t1.id = t2.id
  order by id, left_pos

Das Ergebnis des Views sieht so aus:

select * from tweet_word_pairs

Um nun z.B. herauszufinden, welche Wörter häufig auf das Wort "not" folgen, kann folgendes SQL verwendet werden:

select left_word
      ,right_word
      ,count(1) as `num_occurences`
from tweet_word_pairs
where left_word = 'not'
group by left_word, right_word
order by `num_occurences` desc

Related

https://s3.us-east-1.amazonaws.com/nicolas.meseth/databricks-notebooks/export_hashtag_pairs_for_visualization_with_gephi.htmls3.us-east-1.amazonaws.com