Datasets to Build Hate Speech Detection ML-Based Models
The increasing number of social media users has several upsides and downsides. Hate speech online is one of the prominent issues, especially due to the freedom and anonymity given to users and the lack of effective regulations provided by the social network platforms. This problem affects not only the abuse victims but also social media platforms, governments and societies, affecting several public debates about immigration, security, and multiculturalism: aggressive and stereotypical statements hinder a constructive dialogue between users, thus seriously obstructing the achievement of an equal, cohesive, and inclusive society.
Hate speech is becoming a significant problem in online communication on social media, with effects that potentially may result in dangerous criminal acts offline [3].
As an example of an extreme case, in Rohingya, Myanmar, in 2017, hate speech on social media has been heavily implicated in inciting violence against the Rohingya Muslim Minority, including the murder of thousands of civilians, and ICT companies controlling social media platforms had to admit that they failed to prevent such platforms from being used to ‘‘foment division and incite offline violence [4].
What is Hate Speech?
Hate Speech (HS) can be defined as any type of communication that is abusive, insulting, intimidating, harassing, and/or inciting violence or discrimination, disparaging a person or a vulnerable group based on some characteristics such as ethnicity, gender, sexual orientation, religion, or other characteristics [1].
HS is assaultive: it verbally attacks, degrades, and persecutes its targets. Psychological research has shown that assaultive speech does not only harm its targets but also triggers prejudice in bystanders. It elicits social exclusion [2], and higher suicide rates among targets; and prompts prejudice, intentional bias, and dehumanization in the audience.
In the following sections, I will present datasets that enable researchers build machine learning models to detect hate speech in text.
Davidson et al. Dataset.
This dataset (Davidson et al., 2017) contains 24,783 tweets in English and annotated with three labels: hate speech,offensive, and neither.
This corpus was scraped from Twitter by using Twitter Search API based on keywords obtained from Hate Base.
The collection was manually rated by at least three annotators using the CrowdFlower platform. The final label of each tweet was assigned based on a majority vote, with the inter-annotator agreement of the overall dataset reaching about 0.92. Most instances were labeled as offensive(77.4%), while hate speech only 5.8%, and the remaining 16.8% were labeled as neither.
Basile et al. Dataset.
This corpus (Basile et al., 2019) contains 13,000 tweets in English and Spanish, distributed across two different hate speech targets including immigrant and women. This dataset was manually annotated by using the Figure Eight platform(now called Appen) with three layers of annotation, including hatefulness (hate speech or not), target range (generic or individual),and aggressiveness (aggressive or not).
The annotation process achieved a quite high inter-annotator agreement (0.83, 0.73, and0.70 respectively). This corpus has been used for HatEval 2019, a shared task at SemEval 2019, which focuses on multilingual hate speech detection.
Founta et al. Dataset
This dataset (Founta et al., 2018) contains 80,000 English tweets, tagged with seven mutually exclusive labels, namely offensive,abusive,hateful,aggressive, cyber bullying, spam, and normal. The initial collection of this dataset contains 30 million tweets gathered by using Twitter Stream API during the period from 30 March 2017 to 9 April 2017.
A minimum of five crowd workers annotated each instance, and the final label was decided based on a majority vote.
Ousidhoum et al. Dataset
This dataset (Ousidhoum et al., 2019) contains 13,014 tweets and consists of three different languages: English (5647), French (4014), and Arabic (3353). The dataset was annotated by using a crowdsourcing with the Amazon Mechanical Turk platform. The average Krippendorff scores for inter-annotator agreement are 0.153, 0.244, and 0.202 for English,French, and Arabic respectively.
Mandl et al.Dataset
This dataset sampled from Twitter and partially from Facebook contains 17,657 instances in three different languages covering English (7,005), Hindi (5,983), and German (4,669).10The original dataset was annotated with three different annotation layers as part of the Hate Speech and Offensive Content Identification in Indo-European Languages shared task in FIRE2019.
Ross et al.Dataset
The original collection of this dataset contains 469 tweets, where two raters annotated each tweet.
This corpus contains tweets mostly related to the refugee crisis in Germany, collected by using ten specific hashtags roughly dating from February to March 2016.
Ibrohim et al. Dataset
This dataset contains 13,169 tweets in Indonesian, crawled from Twitter with the Search API by using several keywords related to hate speech towards categories including religion, race, physical disability, and gender, in the span of 7months (March–September 2018).
Several annotation layers were introduced, mainly focusing on hate speech and abusive language.
Alfina et al. Dataset
This dataset (Alfina et al., 2017) consists of 713 tweets in Indonesian, 260 tweets labeled as hate speech,and 453 as not hate speech. The tweets were gathered from Twitter with the Twitter Streaming API using hashtags related to political events in Indonesia from the beginning of February until April 2017.
The annotation process involved 30 college students, 43.3%men and 56.7% women.
Bosco et al. Dataset
This dataset contains 4,000 tweets in Italian sampled from 6,928 tweets crawled from Twitter with a keyword-based approach.
The keywords were chosen based on three social groups, considered potential targets of hate speech in Italy, namely Immigrant,Muslim, and Roma.
This collection was annotated with the Figure Eight platform.
The dataset was used in the hate speech detection (HaSpeDe) shared task in EVALITA 2018.
Fortuna et al. Dataset
This dataset comprises 5,670 tweets in Portuguese and was collected based on keywords and profiles using the Twitter Search API. Most tweets were posted from January until March 2017. The dataset was rated using a finer-grained hierarchical annotation scheme with 81 hate speech categories.
Pereira et al. Dataset
This corpus contains 6,000 tweets in Spanish and was filtered from 2 million tweets gathered from Twitter from February to December 2017. The filtering process involved several keywords, which were categorized as absolute hate or relative hate. The dataset was annotated with a binary label (hate speech vs. not hate speech). The annotation process includes four annotators, where the final label was decided based on a majority vote. In the case of disagreement, a fifth annotator cast the deciding vote.
References
[1] “You don’t understand, this is a new war!” analysis of hate speech in news web sites’ comments. Erjavec K. and Kovačič M.P. Mass Communication and Society, 15 (6) (2012), pp. 899–920, 10.1080/15205436.2011.619679
[2] Complexity and valence in ethnophaulisms and exclusion of ethnic out-groups: what puts the ”hate” into hate speech? Leader T., Mullen B. and Rice D. Journal of Personality and Social Psychology, 96 1 (2009), pp. 170–182
[3] Müller and Schwarz, 2019 : Fanning the flames of hate: Social media and hate crime, Available at SSRN 3082972
[4] Facebook Admits It Was Used to Incite Violence in Myanmar