SOS: A Distributed Mobile Q&A System Based on Social Networks

Abstract Recently, emerging research efforts have been focused on question and answer (Q&A) systems based on social networks. The social-based Q&A systems can answer nonfactual questions, which cannot be easily resolved by web search engines. These systems either rely on a centralized server for identifying friends based on social information or broadcast a user’s questions […]

CoRE: A Context-Aware Relation Extraction Method for Relation Completion

Abstract We identify relation completion (RC) as one recurring problem that is central to the success of novel big data applications such as Entity Reconstruction and Data Enrichment. Given a semantic relation ℜ, RC attempts at linking entity pairs between two entity lists under the relation ℜ. To accomplish the RC goals, we propose to […]

Using Incomplete Information for Complete Weight Annotation of Road Networks

Abstract We are witnessing increasing interests in the effective use of road networks. For example, to enable effective vehicle routing, weighted-graph models of transportation networks are used, where the weight of an edge captures some cost associated with traversing the edge, e.g., greenhouse gas (GHG) emissions or travel time. It is a precondition to using […]

Mining Probabilistically Frequent Sequential Patterns in Large Uncertain Databases

Abstract Data uncertainty is inherent in many real-world applications such as environmental surveillance and mobile tracking. Mining sequential patterns from inaccurate data, such as those data arising from sensor readings and GPS trajectories, is important for discovering hidden knowledge in such applications. In this paper, we propose to measure pattern frequentness based on the possible […]

Effective and Efficient Clustering Methods for Correlated Probabilistic Graphs

Abstract Recently, probabilistic graphs have attracted significant interests of the data mining community. It is observed that correlations may exist among adjacent edges in various probabilistic graphs. As one of the basic mining techniques, graph clustering is widely used in exploratory data analysis, such as data compression, information retrieval, image segmentation, etc. Graph clustering aims […]

A General Technique for Top-$k$ Geometric Intersection Query Problems

Abstract In a top-k Geometric Intersection Query (top-k GIQ) problem, a set of n weighted, geometric objects in Rd is to be pre-processed into a compact data structure so that for any query geometric object, q, and integer k > 0, the k largest-weight objects intersected by q can be reported efficiently. While the top-k […]

Secure Mining of Association Rules in Horizontally Distributed Databases

Abstract We propose a protocol for secure mining of association rules in horizontally distributed databases. Our protocol, like theirs, is based on the Fast Distributed Mining (FDM) algorithm which is an unsecured distributed version of the Apriori algorithm. The main ingredients in our protocol are two novel secure multi-party algorithms — one that computes the […]

Random Projection Random Discretization Ensembles—Ensembles of Linear Multivariate Decision Trees

Abstract In this paper, we present a novel ensemble method random projection random discretization ensembles(RPRDE) to create ensembles of linear multivariate decision trees by using a univariate decision tree algorithm. The present method combines the better computational complexity of a univariate decision tree algorithm with the better representational power of linear multivariate decision trees. We […]

Mining Statistically Significant Co-location and Segregation Patterns

Abstract In spatial domains, interaction between features gives rise to two types of interaction patterns: co-location and segregation patterns. Existing approaches to finding co-location patterns have several shortcomings: (1) They depend on user specified thresholds for prevalence measures; (2) they do not take spatial auto-correlation into account; and (3) they may report co-locations even if […]

Mining Probabilistically Frequent Sequential Patterns in Large Uncertain Databases

Abstract Data uncertainty is inherent in many real-world applications such as environmental surveillance and mobile tracking. Mining sequential patterns from inaccurate data, such as those data arising from sensor readings and GPS trajectories, is important for discovering hidden knowledge in such applications. In this paper, we propose to measure pattern frequentness based on the possible […]