Dataset of tweets related to Cilento

The dataset is collected thanks to the cilentotook project (http://www.cilentodascoprire.it). Tweets posted by tourists visited Cilento have been saved, considering an extended period of data collection and a large geographical area. This occurs not only in Cilento and during the vacation, because once the tourists have left the holiday spot they continue to talk about their holiday by adding other useful information for our analysis.  Tweets are collected by a custom Python script that relies on the tweepy library. Tweepy enables an easy use of twitter streaming API handling authentication, connection, session and message reading. Scripts run in a docker container and apply a spatial filter based on bounding box to retrieve only tweets in a well-known area exploiting the geographical feature of Twitter API. Extracted tweets are then stored in a MongoDB database running on a docker container. The data are collected in a period from May 2017 to May 2019.

To obtain the ground truth of the collected tweets (Italian and English), the true sentiment has been manually estimated by human annotators, thus providing a more precise and less noisy dataset compared to automatically generated labels from hashtags. 

The dataset is composed of geotagged Twitter data related to Cliento area as follows:
- 3200 tweets with positive sentiment;
- 4720 tweets with neutral sentiment;
- 4842 tweets with negative sentiment.
 

To obtain this dataset, as well as the implementation code, we ask you to complete, sign and return the form below. After that, I will send you the credentials to download it. Note that the dataset is available only for research purposes.

  • Fill out this form: request form
  • Send it to: vrai@dii.univpm.it (Note: you should send the email from an email address that is linked to your research institution/university)
  • Wait for the credentials
  • You will be sent a link for the download.