ACM Transactions on

the Web (TWEB)

Latest Articles

“The Enemy Among Us”: Detecting Cyber Hate Speech with Threats-based Othering Language Embeddings

Offensive or antagonistic language targeted at individuals and social groups based on their personal characteristics (also known as cyber hate speech or cyberhate) has been frequently posted and widely circulated via the World Wide Web. This can be considered as a key risk factor for individual and societal tension surrounding regional instability.... (more)

User Studies on End-User Service Composition: A Literature Review and a Design Framework

Context: End-user service composition (EUSC) is a service-oriented paradigm that aims to empower end users and allow them to compose their own web applications from reusable service components. User studies have been used to evaluate EUSC tools and processes. Such an approach should benefit software development, because incorporating end... (more)

Detecting Cyberbullying and Cyberaggression in Social Media

Cyberbullying and cyberaggression are increasingly worrisome phenomena affecting people across all... (more)


About TWEB

The journal Transactions on the Web (TWEB) publishes refereed articles reporting the results of research on Web content, applications, use, and related enabling technologies.

The scope of TWEB is described on the Call for Papers page. Authors are invited to submit original research papers for consideration by following the directions on the Author Guidelines page.

read more
Forthcoming Articles
An Outsourcing Model for Alert Analysis in a Cybersecurity Operations Center

A typical Cybersecurity Operations Center (CSOC) is a service organization. It hires and trains analysts, whose task is to perform analysis of alerts that were generated while monitoring the client?s networks. Due to ever-increasing financial and infrastructure burden on a CSOC driven by the rapidly growing demand for security services, it would become prohibitively expensive to continually expand the size of a CSOC in order to meet the demands in the future. An alternative solution is to outsource the alert analysis process to on-demand analysts, in order to provide scalable CSOC service to its clients with features such as, 1) higher throughput, 2) higher quality, and 3) more economical service than the current in-house service. This paper presents a novel two-step sequential mixed integer programming optimization method that is used in the development of a new decision-support business model for outsourcing the alert analysis process. It is demonstrated that through this model, a CSOC can effectively deliver its alert management services with the above-mentioned features. Results indicate that the model is scalable, computationally viable, realtime implementable, and can deliver CSOC services that meet the service level agreement (SLA) between the CSOC and its client.

Improving the Accuracy of the Video Popularity Prediction Models through user grouping and video popularity classification

This paper proposes a novel approach for enhancing the video popularity prediction models. Using the proposed approach, we enhance three popularity prediction techniques that outperform the accuracy of the prior state-of-the-art solutions. The major components of the proposed approach are two novel mechanisms for "\textit{user grouping}" and "\textit{content classification}". The user grouping method is an unsupervised clustering approach that divides the users into an adequate number of user groups with similar interests. The content classification approach identifies the classes of videos with similar popularity growth trends. To predict the popularity of the newly-released videos, our proposed popularity prediction model trains its parameters in each user group and its associated video popularity classes. Evaluations are performed through a 5-fold cross validation and on a dataset containing one month video request records of 26,706 number of BBC iPlayer users. Using the proposed grouping technique, user groups of similar interest and up to 2 video popularity classes for each user group were detected. Our analysis shows that the accuracy of the proposed solution outperforms the state-of-the-art including SH, ML, MRBF models on average by 45\%, 33\% and 24\%, respectively. Finally, we discuss how various systems in the network and service management domain such as cache deployment, advertising and video broadcasting technologies benefit from our findings to illustrate the implications.

Efficient Pairwise Penetrating-Rank Similarity Retrieval

Many web applications demand a measure of similarity between two entities, such as collaborative filtering, web document ranking, linkage prediction, and anomaly detection. P-Rank (Penetrating- Rank) has been accepted as a promising graph-based similarity measure as it provides a compre- hensive way of encoding both incoming and outgoing links into assessment. However, the existing method to compute P-Rank is iterative in nature and rather cost-inhibitive. Moreover, the accuracy estimate and stability issues for P-Rank computation have not been addressed. In this paper, we consider the optimization techniques for P-Rank search that encompasses its accuracy, stability and computational efficiency. (1) The accuracy estimation is provided for P-Rank iterations, with the aim to find out the number of iterations, k, required to guarantee a desired accuracy. (2) A rigorous bound on the condition number of P-Rank is obtained for stability analysis. Based on this bound, it can be shown that P-Rank is stable and well-conditioned when the damping factors are chosen to be suitably small. (3) Two matrix-based algorithms, applicable to digraphs and undirected graphs, are respectively devised for efficient P-Rank computation, which improves the computational time from O(kn3) to O(?n2+ ?6) for digraphs, and to O(?n2) for undirected graphs, where n is the number of vertices in the graph, and ? (? n) is the target rank of the graph. Moreover, our pro- posed algorithms can significantly reduce the memory space of P-Rank computations from O(n2) to O(?n + ?4) for digraphs, and to O(?n) for undirected graphs, respectively. Finally, extensive experiments on real-world and synthetic datasets demonstrate the usefulness and efficiency of the proposed techniques for P-Rank similarity assessment on various networks.

'The Best of Both Worlds!' Integration of Web Page and Eye Tracking Data Driven Approaches for Automatic AOI Detection

Web pages are comprised of different kinds of elements (menus, adverts, etc). Segmenting pages into their elements has long been important in understanding how people experience those pages, and in making those experiences 'better'. Many approaches have been proposed which relate the resultant elements with the underlying source code, however, they do not consider users' interactions. Another group of approaches analyses eye movements of users to discover areas that interest or attract them (AOIs). Although these approaches consider how users interact with web pages, they do not relate AOIs with the underlying source code. We propose a novel approach which integrates web page and eye tracking data driven approaches for automatic AOI detection. This approach segments an entire web page into its AOIs by considering users' interactions and relates AOIs with the underlying source code. Based on the Adjusted Rand Index measure, our approach provides the most similar segmentation to the ground truth segmentation compared to its individual components.

Combining URL and HTML features for Entity Discovery in the Web

The Web is a large repository of entity-pages. An entity-page is a page that publishes data representing an entity of a particular type. For example, a page that describes a driver in a website about a car racing championship. The attribute values published in the entity-pages can be used for many applications, for example, to provide direct answers for searches about entities. In this paper, we propose a novel method, called SSUP, which discovers the entity-pages in the websites. The novelty of our method is that it combines URL and HTML features in a way that allows the URL terms to have different weights depending on their capacity to distinguish entity-pages from other pages, and thus the efficacy of the entity-page discovery task is increased. SSUP learns the similarity thresholds in each website without human intervention. We carried out experiments on a dataset with different real-world websites and a wide range of entity types. SSUP achieved a 95% rate of precision and 85% recall rate. Our method was compared with two state-of-the-art methods and outperformed them with a precision gain between 51% and 66%.

Understanding Free Web Proxies: Performance, Behavior, and Usage

Free web proxies promise anonymity and censorship circumvention at no cost. Several websites publish lists of free proxies organized by country, anonymity level, and performance. These lists index hundreds of thousand of hosts discovered via automated tools and crowd-sourcing. A complex free proxy ecosystem has been forming over the years, of which very little is known. In this paper we shed light on this ecosystem via a distributed measurement platform that leverages both active and passive measurements. Active measurements are carried out by an infrastructure we name ProxyTorrent that discover free proxies, assess their performance, and detect potential malicious activities. Passive measurements relate to proxy performance and usage in the wild are accomplished by means of a Chrome plugin named Ciao. ProxyTorrent has been running since January 2017, monitoring up to 200,000 free proxies. Ciao was launched in March 2017 and has thus far served roughly 3,000 users and generated 3 TB of traffic. Our analysis shows that less than 2% of the proxies announced on the Web indeed proxy traffic on behalf of users; further, only half of these proxies have decent performance and can be used reliably. Around 10% of the working proxies exhibit malicious behaviors, e.g., ads injection and TLS interception, and these proxies are also the ones providing the best performance. Through the analysis of more than 2 TB of proxied traffic, we show that web browsing is the primary user activity. Geo-blocking avoidance is not a prominent use-case, with the exception of proxies located in countries hosting popular geo-blocked content.

All ACM Journals | See Full Journal Index

Search TWEB
enter search term and/or author name