Skip to main navigation Skip to search Skip to main content

Unleashing the Power of Hashtags in Tweet Analytics with Distributed Framework on Apache Storm

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Twitter is a popular social network platform where users can interact and post texts of up to 280 characters called tweets. Hashtags, hyperlinked words in tweets, have increasingly become crucial for tweet retrieval and search. Using hashtags for tweet topic classification is a challenging problem because of context dependent among words, slangs, abbreviation and emoticons in a short tweet along with evolving use of hashtags. Since Twitter generates millions of tweets daily, tweet analytics is a fundamental problem of Big data stream that often requires a real-time Distributed processing. This paper proposes a distributed online approach to tweet topic classification with hashtags. Being implemented on Apache Storm, a distributed real time framework, our approach incrementally identifies and updates a set of strong predictors in the Naïve Bayes model for classifying each incoming tweet instance. Preliminary experiments show promising results with up to 97% accuracy and 37% increase in throughput on eight processors.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
EditorsNaoki Abe, Huan Liu, Calton Pu, Xiaohua Hu, Nesreen Ahmed, Mu Qiao, Yang Song, Donald Kossmann, Bing Liu, Kisung Lee, Jiliang Tang, Jingrui He, Jeffrey Saltz
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4554-4558
Number of pages5
ISBN (Electronic)9781538650356
DOIs
StatePublished - Jul 2 2018
Externally publishedYes
Event2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States
Duration: Dec 10 2018Dec 13 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Conference

Conference2018 IEEE International Conference on Big Data, Big Data 2018
Country/TerritoryUnited States
CitySeattle
Period12/10/1812/13/18

Keywords

  • Apache Storm
  • Big Data Stream
  • Hashtags
  • Ontology
  • Social Media
  • Twitter

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Unleashing the Power of Hashtags in Tweet Analytics with Distributed Framework on Apache Storm'. Together they form a unique fingerprint.

Cite this