Exploring the Emerging Type of Comment for Online Videos: DanMu

DanMu, an emerging type of user-generated comment, has become increasingly popular in recent years. Many online video platforms such as have... (more)

Adaptive Knowledge Propagation in Web Ontologies

We focus on the problem of predicting missing assertions in Web ontologies. We start from the assumption that individual resources that are similar in some aspects are more likely to be linked by specific relations: this phenomenon is also referred to as homophily and emerges in a variety of relational domains. In this article, we propose a method... (more)

Recommendation in a Changing World: Exploiting Temporal Dynamics in Ratings and Reviews

Users’ preferences, and consequently their ratings and reviews to items, change over time. Likewise, characteristics of items are also time-varying. By dividing data into time periods, temporal Recommender Systems (RSs) improve recommendation accuracy by exploring the temporal dynamics in user rating data. However, temporal RSs have to cope... (more)

Activity Recommendation with Partners

Recommending social activities, such as watching movies or having dinner, is a common function found in social networks or e-commerce sites. Besides certain websites which manage activity-related locations (e.g.,, many items on product sale platforms (e.g., can naturally be mapped to social activities. For example,... (more)

Caching to Reduce Mobile App Energy Consumption

Mobile applications consume device energy for their operations, and the fast rate of battery depletion on mobile devices poses a major usability hurdle. After the display, data communication is the second-biggest consumer of mobile device energy. At the same time, software applications that run on mobile devices represent a fast-growing product... (more)

Modeling and Simulating the Web of Things from an Information Retrieval Perspective

Internet and Web technologies have changed our lives in ways we are not yet fully aware of. In the near future, Internet will interconnect more than... (more)


About TWEB

The journal Transactions on the Web (TWEB) publishes refereed articles reporting the results of research on Web content, applications, use, and related enabling technologies.

The scope of TWEB is described on the Call for Papers page. Authors are invited to submit original research papers for consideration by following the directions on the Author Guidelines page.

Forthcoming Articles
Exploiting usage to predict instantaneous app popularity: Trend filters and retention rates

The popularity of mobile apps is traditionally measured by metrics such as the number of downloads, installations, or user ratings. A problem with these measures is that they reflect usage only indirectly. We propose to exploit actual app usage statistics. Indeed, retention rates, i.e., the number of days users continue to interact with an installed app have been suggested to predict successful app lifecycles. We conduct the first independent and large-scale study of retention rates and usage trends on a database of app-usage data from a community of 339,842 users and more than 213,667 apps. Our analysis shows that, on average, applications lose 65% of their users in the first week, while very popular applications (top 100) lose only 35%. It also reveals, however, that many applications have more complex usage behavior patterns due to seasonality, marketing, or other factors. To capture such effects, we develop a novel app-usage trend measure which provides instantaneous information about the popularity of an application. Our analysis shows that roughly 40% of all apps never gain more than a handful of users (Marginal apps). Less than 0.4% of the remaining 60% are constantly popular (Dominant apps), 1% have a quick drain of usage after an initial steep rise (Expired apps), and 7% continuously rise in popularity (Hot apps). From these, we can distinguish, for instance, trendsetters from copycat apps. We conclude by demonstrating that usage behavior trend information can be used to develop better mobile app recommendations.

BUbiNG: Massive Crawling for the Masses

Although web crawlers have been around for twenty years by now, there is virtually no freely available, open-source crawling software that guarantees high throughput, overcomes the limits of single-machine systems and at the same time scales linearly with the amount of resources available. This paper aims at filling this gap, through the description of BUbiNG, our next-generation web crawler built upon the authors' experience with UbiCrawler and on the last ten years of research on the topic. BUbiNG is an open-source Java fully distributed crawler; a single BUbiNG agent, using sizeable hardware, can crawl several thousands pages per second respecting strict politeness constraints, both host- and IP-based. Unlike existing open-source distributed crawlers that rely on batch techniques (like MapReduce), BUbiNG job distribution is based on modern high-speed protocols so to achieve very high throughput.

Unsupervised Domain Ranking in Large-Scale Web Crawls

With the proliferation of web spam and infinite auto-generated web content, large-scale web crawlers require low-complexity ranking methods to effectively budget their limited resources and allocate bandwidth to reputable sites. To shed light on Internet-wide spam avoidance, we study topology-based ranking algorithms on domain-level graphs from the two largest academic crawls -- a 6.3B-page IRLbot dataset and a 1B-page ClueWeb09 exploration. We first propose a new methodology for comparing the various rankings and then show that in-degree BFS-based techniques decisively outperform classic PageRank-style methods, including TrustRank. However, since BFS requires several orders of magnitude higher overhead and is generally infeasible for real-time use, we propose a fast, accurate, and scalable estimation method called TSE that can achieve much better crawl prioritization in practice. It is especially beneficial in applications with limited hardware resources.

Evaluating the Quality in Use of Corporate Web Sites: An Empirical Investigation

In a previous paper we presented a novel approach to the evaluation of quality in use of corporate web sites based on an original quality model (QM-U) and a related methodology to put it into practice (EQ-EVAL). This paper focuses on two research questions. The first one aims to investigate whether expected quality obtained through the application of EQ-EVAL methodology by employing a small panel of evaluators is a good approximation of actual quality obtained through experimentation with real users. In order to answer this research question, a comparative study has been carried out involving five evaluators and fifty real users. The second research question aims to demonstrate that the adoption of the EQ-EVAL methodology can provide useful information for web site improvement. Three original indicators, namely coherence, coverage and ranking have been defined in order to answer this second question, and an additional study comparing the assessments of two panels of five and ten evaluators respectively has been carried out. The results obtained in both comparative studies are largely positive and provide a rational support for the adoption of the EQ-EVAL methodology.

Faster Base64 Encoding and Decoding using AVX2 Instructions

Web developers use base64 formats to include images, fonts, sounds and other resources directly inside HTML, JavaScript, JSON and XML files. We estimate that billions of base64 messages are decoded every day. We are motivated to improve the efficiency of base64 encoding and decoding. Compared to state-of-the-art implementations, we multiply the speeds of both the encoding (H10×) and the decoding (H7×). We achieve these good results by using the single-instruction-multiple-data (SIMD) instructions available on recent Intel processors (AVX2). Our accelerated software abides by the specification and reports errors when encountering characters outside of the base64 set. It is available online as free software under a liberal license.

A model of information diffusion in interconnected online social networks

Online social networks (OSN) have today reached a capillary diffusion and people often subscribe to several OSNs. This phenomenon leads to online social internetworking (OSI) scenarios where users who subscribe to multiple OSNs are termed as bridges. Unfortunately, several important features make the study of information propagation in an OSI scenario a difficult task, e.g., correlations in both the structural characteristics of OSNs and the bridge interconnections, heterogeneity and size of OSNs, activity factors, cross-posting propensity, etc. In this paper we propose a directed random graph-based model that is amenable to efficient numerical solution to analyze the phenomenon of information propagation in an OSI scenario; in the model development we take into account heterogeneity and correlations introduced by both topological (correlations among nodes degrees and among bridge distributions) and user-related factors (activity index, cross-posting propensity). We first validate the model predictions against simulations on snapshots of interconnected OSNs in a reference scenario. Subsequently, we exploit the model to show the impact on the information propagation of several characteristics of the reference scenario, i.e., size and complexity of the OSI scenario, degree distribution and overall number of bridges, growth and decline of OSNs in time, and time-varying cross-posting users propensity.

new phone, who dis? Modeling Millennials¿ Backup Behavior

Given the ever-rising frequency of malware attacks and other problems leading people to lose their files, backups are an important proactive protective behavior in which users can engage. Backing up files can prevent emotional and financial losses and improve overall user experience. Yet, we find that less than half of young adults perform mobile or computer backups at least every few months. To understand why, we model the factors that drive mobile and computer backup behavior, and changes in that behavior over time, using data from a panel survey of 384 diverse young adults. We develop a set of models that explain 37% and 38% of the variance in reported mobile and computer backup behaviors, respectively. These models show consistent relationships between Internet skills and backup frequency on both mobile and computer devices. We find that this relationship holds longitudinally: increases in Internet skills lead to increased frequency of computer backups. This paper provides a foundation for understanding what drives young adult's backup behavior. It concludes with recommendations for motivating people to back up and for future work modeling similar user behaviors.

Optimizing Whole-Page Presentation for Web Search

Modern search engines aggregate results from different verticals: webpages, news, images, video, shopping, knowledge cards, local maps, etc. Unlike ``ten blue links'', these search results are heterogeneous in nature and not even arranged in a list on the page. This revolution directly challenges the conventional ``ranked list'' formulation in ad hoc search. Therefore, finding proper presentation for a gallery of heterogeneous results is critical for modern search engines. We propose a novel framework that learns the optimal page presentation to render heterogeneous results onto search result page (SERP). Page presentation is broadly defined as the strategy to present a set of items on SERP, much more expressive than a ranked list. It can specify item positions, image sizes, text fonts, and any other styles as long as variations are within business and design constraints. The learned presentation is content-aware, i.e. tailored to specific queries and returned results. Simulation experiments show that the framework automatically learns eye-catchy presentations for relevant results. Experiments on real data show that simple instantiations of the framework already outperform leading algorithm in federated search result presentation. It means the framework can learn its own result presentation strategy purely from data, without even knowing the ``probability ranking principle''.

Localness of location-based knowledge sharing: A Study of Naver KiN "Here"

In location-based social Q&A, the questions related to a local community (e.g., local services and places) are typically answered by local residents (i.e., people who have the local knowledge). This study aims to deepen our understanding of location-based knowledge sharing through investigating general users behavioral characteristics, the topical and typological patterns related to the geographic characteristics, geographic locality of user activities, and motivations of local knowledge sharing. To this end, we analyzed a 12-month period Q&A dataset from Naver KiN Here and a supplementary survey dataset from 285 mobile users. Our results revealed several unique characteristics of location-based social Q&A. When compared with conventional social Q&A sites, Naver KiN Here had distinctive users behavior patterns and different topical/typological patterns. In addition, Naver KiN Here exhibited a strong spatial locality where the answers mostly had 1-3 spatial clusters of contributions, and a typical cluster spanned a few neighboring districts. We also uncovered unique motivators, e.g., ownership of local knowledge and a sense of local community. The findings reported in the paper have significant implications for the design of Q&A systems, especially location-based social Q&A systems.

Extracting and Summarizing Situational Information from the Twitter Social Media during Disasters

Microblogging sites like Twitter have become important sources of real-time information during disaster events. A large amount of valuable situational information is posted in these sites during disasters; however, the information is dispersed among hundreds of thousands of tweets containing sentiments and opinion of the masses. To effectively utilize microblogging sites during disaster events, it is necessary to not only extract the situational information from the large amounts of sentiment and opinion, but also to summarize the large amounts of situational information posted in real-time. During disasters in countries like India, a sizeable number of tweets are posted in local resource-poor languages besides the normal English-language tweets. For instance, in the Indian subcontinent, a large number of tweets are posted in Hindi / Devanagari (the national language of India), and some of the information contained in such non-English tweets are not available (or available at a later point of time) through English tweets. In this work, we develop a novel classification-summarization framework which handles tweets in both English and Hindi -- we first extract tweets containing situational information, and then summarize this information. Our proposed methodology is developed based on the understanding of how several concepts evolve in Twitter during disaster. This understanding helps us achieve superior performance compared to the state-of-the-art tweet classifiers and summarization approaches on English tweets. Additionally, to our knowledge, this is the first attempt to extract situational information from non-English tweets.

Completeness Management for RDF Data Sources

The Semantic Web is commonly interpreted under the open-world assumption. Under this setting, available information only captures a subset of the reality, thus hindering certainty as to whether the reality is fully described (e.g., in the answer to a query). While there are several aspects of the reality where one can observe complete information, there is currently no way to assert meta-information about completeness in a machine-readable form. The aim of this paper is to fill this gap and to contribute a (formal) study of how to describe the completeness of parts of the Semantic Web, and how to leverage this novel information for query answering. One immediate benefit is that now query answers can be complemented with information about their completeness. More specifically, we introduce a theoretical framework allowing to augment RDF data sources with statements, also expressed in RDF, about their completeness. We then study the impact of completeness statements on the complexity of query answering by considering different fragments of the SPARQL language, including the RDFS entailment regime, and the federated scenario. We implement an efficient method for reasoning about query completeness and provide an experimental evaluation in the presence of large sets of completeness statements.

A Web Portal Study for High Performance Computing

This paper addresses web interfaces for High Performance Computing (HPC) simulation software. First, it presents a brief history, starting in the 90s with Java applets, of web interfaces used for accessing and making best possible use of remote HPC resources. Then this article reviews the present state of such HPC web-based portals. We identify and discuss the key features and constraints that characterize HPC portals. The design and development of Bull extreme factory Computing Studio v3 (XCS3) is chosen as a common thread for showing how these features can all be implemented in one software: multi-tenancy, multi-scheduler compatibility, HPC application template framework, complete control through an HTTP RESTful API, customizable user interface with Responsive Web Design, remote visualization, Role Base Access Control, and access through the Authentication, Authorization, and Accounting proven security framework. The paper concludes with the benefits of using such an HPC portal for both end-users and IT administrators.

Characterizing and Predicting User's Behavior on Local Search Queries

The use of queries to find products and services that are located nearby is increasing rapidly due mainly to the ubiquity of internet access and location services provided by smartphone devices. Local search engines help users by matching queries with a predefined geographical connotation (local queries) against a database of local business listings. Local search differs from traditional Web search because, to correctly capture users click behavior, the estimation of relevance between query and candidate results must be integrated with geographical signals, such as distance. The intuition is that users prefer businesses that are physically closer to them or in a convenient area (e.g. close to their home). However, this notion of closeness depends upon other factors, like the business category, the quality of the service provided, the density of businesses in the area of interest, the hour of the day or even the day of the week. In this work we perform an extensive analysis of online users interactions with a local search engine, investigating their intent, temporal patterns, and highlighting relationships between distance-to-business and other factors, such as business reputation, Furthermore, we investigate the problem of estimating the click-through rate on local search (LCTR) by exploiting the combination of standard retrieval methods with a rich collection of geo, user and business-dependent features. We validate our approach on a large log collected from a real-world local search service. Our evaluation shows that the non-linear combination of business and user information, geo-local and textual relevance features leads to a significant improvements over existing alternative approaches based on a combination of relevance, distance and business reputation.


First Name Last Name Award
Ricardo A Baeza-Yates ACM Fellows (2009)
Massimo Bernaschi ACM Gordon Bell Prize
National Research Council of Italy (2011) ACM Gordon Bell Prize
National Research Council of Italy (2011)
Elisa Bertino ACM Fellows (2003)
Maria Bielikova ACM Senior Member (2009)
Dan Boneh ACM Fellows (2016)
ACM Prize in Computing (2014)
Athman Bouguettaya ACM Distinguished Member (2012)
ACM Senior Member (2007)
Andrei Broder ACM Paris Kanellakis Theory and Practice Award (2012)
ACM Fellows (2007)
Carlos A. Castillo ACM Senior Member (2014)
Stefano Ceri ACM Fellows (2013)
Chen-Nee Chuah ACM Distinguished Member (2012)
ACM Senior Member (2006)
Lorrie Faith Cranor ACM Fellows (2014)
ACM Senior Member (2006)
Ernesto Damiani ACM Distinguished Member (2008)
Schahram Dustdar ACM Distinguished Member (2009)
Christos Faloutsos ACM Fellows (2010)
Elena Ferrari ACM Distinguished Member (2011)
Ophir Frieder ACM Fellows (2005)
Hector Garcia-Molina ACM Fellows (1997)
Lee Giles ACM Fellows (2006)
Vicki Hanson ACM Fellows (2004)
Simon Harper ACM Distinguished Member (2014)
ACM Senior Member (2009)
Monika Henzinger ACM Fellows (2016)
Djoerd Hiemstra ACM Senior Member (2009)
Eric Horvitz ACM AAAI Allen Newell Award (2015)
ACM Fellows (2014)
Bernard Jansen ACM Senior Member (2017)
Craig Knoblock ACM Fellows (2017)
ACM Distinguished Member (2008)
Ming Li ACM Fellows (2006)
Bing Liu ACM Fellows (2015)
Yiqun Liu ACM Senior Member (2016)
Dmitri Loguinov ACM Distinguished Member (2014)
ACM Senior Member (2007)
Filippo Menczer ACM Distinguished Member (2013)
Renee J Miller ACM Fellows (2009)
Mourad Ouzzani ACM Senior Member (2009)
Jian Pei ACM Fellows (2015)
ACM Senior Member (2007)
Ali Pinar ACM Distinguished Member (2015)
ACM Senior Member (2011)
Prabhakar Raghavan ACM Fellows (2001)
Naren Ramakrishnan ACM Distinguished Member (2009)
John T Riedl ACM Software System Award (2010)
ACM Fellows (2009)
ACM Distinguished Member (2007)
Michael Rung-Tsong Lyu ACM Fellows (2015)
Prashant J Shenoy ACM Distinguished Member (2009)
ACM Senior Member (2006)
Ingmar Weber ACM Senior Member (2017)
Xing Xie ACM Senior Member (2010)
Qiang Yang ACM Fellows (2017)
ACM Distinguished Member (2011)
Philip S Yu ACM Fellows (1997)
Lixia Zhang ACM Fellows (2006)
Ben Y. Zhao ACM Distinguished Member (2015)
Yu Zheng ACM Distinguished Member (2016)
ACM Senior Member (2011)
