Software & Data Downloads
Graph visualisation
Graphyte is a flexible graph visualization library to investigate the evolution of online dialogues; built with an emphasis on customization and modularity, including an Application Programming Interface (API) to create a pipeline of interconnecting modules.
- Language: JavaScript
- Dashboard Applications (T5.4): Keyword Graph, Social Network Analysis
- Download Graphyte at github
Social media thread processing
The conversation collection script allows the user to collect the set of tweets replying to a specific tweet, forming a conversation or a thread.
Rumour categorisation
An important part of tracking the spread of rumours online is detecting if a message supports, denies, or queries the claim. Our code for this is available on GitHub.
Multilingual preprocessing
English
Social media processing for English is published in TwitIE, an integral part of GATE
- TwitIE information
- GATE download (includes TwitIE)
Entity disambiguation is provided with YODIE
We’re also making a generic entity recognition package available
Bulgarian
The Bulgarian pipeline is available as a web service. It can handle text directly, or plain text file uploads. To process text directly, call as follows:
curl -X POST --data-binary "Момичето яде сладолед." http://213.191.204.69:5080/webpipe/process
To process a file, use:
curl -X POST --data @test.txt http://213.191.204.69:5080/webpipe/process
Datasets
Rumour analyses: journalism use case
This is a dataset collected and annotated within the journalism use case. These rumours are associated with 9 different breaking news. It was created for the analysis of social media rumours, and contains Twitter conversations which are initiated by a rumourous tweet; the conversations include tweets responding to those rumourous tweets. These tweets have been annotated for support, certainty, and evidentiality. This dataset is associated with the D2.4 deliverable.
Rumour analyses: medical use case
An NLP algorithm was developed through the use of 2,400 annotated tweets (training set). The rules created to identify the linguistic patterns indicating a positive reference to mephedrone were then tested on another 2,400 annotated tweets (gold standard set) using GATE. The application was then deployed over the complete dataset of 145,578 tweets retrieved between 2009 and 2014 – 7,044 were identified as true instances of mephedrone.
PHEME RTE dataset
For the special purpose of Natural Language Processing-based information verification, we have built a new Recognizing Textual Entailment (RTE) resource from Twitter data. The PHEME RTE dataset is compiled based on naturally occurring contradiction in manually labeled claims in tweets related to crisis events, and to our knowledge is the first resource for 3-way judgement RTE in the social media and verification domain. From about 500 English tweets related to 70 unique claims we created 5.4k RTE pairs. The RTE pairs are built by a semi-automatic method that is portable across languages and domains, but requires event and claim annotations. The resource, its creation method and pilot RTE evaluation are explained in the following paper:
Piroska Lendvai, Isabelle Augenstein, Kalina Bontcheva, Thierry Declerck (2016). Monolingual Social Media Datasets for Detecting Contradiction and Entailment. Proc. of LREC 2016.
Temporal models of events
Code for Hawkes Process models of the intensity of event discussion over time
Entity recognition
A generic entity recognition toolkit in Python 3, originally designed for named entity recognition but extended to other tasks such as event annotation and timex recognition. This tool relies on Brown clusters and structured prediction, and with default parameters achieved third place in the 2015 W-NUT untyped chunking evaluation.
Capturean
This collects social media data from various sources, feeding it forward for processing. It can handle multiple requests from multiple sites. Just register a filter, or selector, for data, and Capturean handles the rest. It comprises:
- Capture (Capturean) software (with all necesary modules)
- message format translator (for emitting Pheme-compatible messages)
- kafka monitor
- MODUL Dashboard adapter
On the Pheme github repository: Capturean
Stance detection
WP4 included the production of stance classification software. Our state-of-the-art approach is provided at: https://gate.ac.uk/wiki/pheme-stance.html