Offensive or antagonistic language targeted at individuals and social groups based on their personal characteristics (also known as cyber hate speech or cyber hate) has been frequently posted and widely circulated via the World Wide Web. This can be considered as a key risk factor for individual and societal tension linked to regional instability. Automated Web-based cyber hate detection is important for observing and understanding community and regional societal tension - especially in online social networks where posts can be rapidly and widely viewed and disseminated. While previous work has involved using lexicons, bags-of-words or probabilistic language parsing approaches, they often suffer from a similar issue which is that cyber hate can be subtle and indirect - thus depending on the occurrence of individual words or phrases can lead to a significant number of false negatives, providing inaccurate representation of the trends in cyber hate. This problem motivated us to challenge thinking around the representation of subtle language use, such as references to perceived threats from the other including immigration or job prosperity in a hateful context. We propose a novel framework that utilises language use around the concept of othering and intergroup threat theory to identify these subtleties and we implement a novel classification method using embedding learning to compute semantic distances between parts of speech considered to be part of an othering narrative. To validate our approach we conduct several experiments on different types of cyber hate, namely religion, disability, race and sexual orientation, with F-measure scores for classifying hateful instances obtained through applying our model of 0.93, 0.86, 0.97 and 0.98 respectively, providing a significant improvement in classifier accuracy over the state-of-the-art.
Social networks, forums, and social media have emerged as global platforms for forming and shaping opinions on a broad spectrum of topics like politics, sports and entertainment. Users (also called actors) often update their evolving opinions, influenced through discussions with other users. Theoretical models and their analysis on understanding opinion dynamics in social networks abound in the literature. However, these models are often based on concepts from statistical physics. Their goal is to establish various regulatory phenomena like steady-state consensus or bifurcation. Analysis of transient effects is largely avoided. Moreover, many of these studies assume that actors opinions are observed globally and synchronously, which is rarely realistic. In this paper, we initiate an investigation into a family of novel data-driven influence models that accurately learn and fit realistic observations. We estimate and do not presume edge strengths from observed opinions at nodes. Our influence models are linear, but not necessarily positive or row stochastic in nature. As a consequence, unlike the previous studies, they do not depend on system stability or convergence during the observation period. Furthermore, our models take into account a wide variety of data collection scenarios. In particular, they are robust to missing observations for several time steps after an actor has changed its opinion. In addition, we consider scenarios where opinion observations may be available only for aggregated clusters of nodes a practical restriction often imposed to ensure privacy. Finally, to provide a conceptually interpretable design of edge influence, we offer a relatively frugal variant of our influence model, where the strength of influence between two connecting nodes depend on the node attributes (demography, personality, expertise etc.). Such an approach reduces the number of model parameters, reduces overfitting, and offers a tractable and explicable sketch of edge-influences in the context of opinion dynamics. With six real-life datasets crawled from Twitter and Reddit, as well as three more datasets collected from in-house experiments (with 102 volunteers), our proposed system gives significant accuracy boost over four state-of-the-art baselines. We also observe that a careful design of edge strengths using node properties is crucial, since it offers substantially better performance than the one with independent edge weights.
Context: End-user service composition (EUSC) is a service-oriented paradigm that aims to empower end users and allow them to compose their own web applications from reusable service components. User studies have been used to evaluate EUSC tools and processes. Such an approach should benefit software development, because incorporating end users feedback into software development should make software more useful and usable. Problem: There is a gap in our understanding of what constitutes a user study, and how a good user study should be designed, conducted and reported. Goal: This paper aims to address this gap. Method: The paper presents a systematic mapping study of 46 selected user studies for EUSC. Guided by a review framework, the paper systematically and consistently assesses the focus, methodology and cohesion of each of these studies. Results: The paper concludes that the focus of these studies is clear, but their methodology is incomplete and inadequate, their overall quality is poor. The findings lead to the development of a design framework and a set of checklist guidelines for the design, conducting and reporting of good user studies for EUSC. The detailed analysis and the insights obtained from the analysis should be applicable to the design of user studies for service-oriented systems in general.
Cyberbullying and cyberaggression are increasingly worrisome phenomena that affect people across all demographics. Already in 2014, more than half of young social media users worldwide experienced them in some form, being exposed to prolonged and/or coordinated digital harassment. Victims can experience a wide range of emotional consequences such as embarrassment, depression, isolation from other community members, which can lead to even more serious consequences such as suicide attempts. Nevertheless, tools and technologies to understand and mitigate it are scarce and mostly ineffective. In this paper, we take the first concrete steps to understand the characteristics of abusive behavior in Twitter, one of today's largest social networks. We analyze 1.2 million users and 2 million tweets, comparing users participating in discussions around seemingly normal topics like the NBA, to those more likely to be hate-related, such as the Gamergate controversy or the gender pay inequality at the BBC. We also explore specific manifestations of abusive behavior, i.e., cyberbullying and cyberaggression, in one of the hate-related communities (Gamergate). We present a robust methodology to distinguish bullies and aggressors from regular users by considering text, user, and network based attributes. Using various state-of-the-art machine learning algorithms, we can classify these accounts with over 90% accuracy and AUC. Finally, we look at the current status of the Twitter accounts of users marked as abusive by our methodology and discuss the performance of the mechanisms used by Twitter to suspend users.