TOP

Nectar Track Accepted Papers

  • Nectar Session: 1 | Room: 102 | Time: 10:10 – 10:40, Wednesday 17 Sept Sampling-based Data Mining Algorithms: Modern Techniques and Case Studies (Matteo Riondato)

    Author

    Matteo Riondato, Brown University

    Abstract

    Sampling a dataset for faster analysis and looking at it as a sample from an unknown distribution are two faces of the same coin. We discuss the use of modern techniques involving the Vapnik-Chervonenkis (VC) dimension to study the trade-off between sample size and accuracy of data mining results that can be obtained from a sample. We report two case studies where we and collaborators employed these techniques to develop efficient sampling-based algorithms for the problems of betweenness centrality computation in large graphs and extracting statistically significant Frequent Itemsets from transactional datasets.

  • Nectar Session: 1 | Room: 102 | Time: 10:40 – 11:10, Wednesday 17 Sept Machine Learning Approaches for Metagenomics (Huzefa Rangwala, Anveshi Charuvaka, Zeehasham Rasheed)

    Authors

    Huzefa Rangwala, George Mason University
    Anveshi Charuvaka
    Zeehasham Rasheed

    Abstract

    Microbes exists everywhere. Current generation of genomic technologies have allowed researchers to determine the collective DNA sequence of all microorganisms co-existing together. In this paper, we present some of the challenges related to the analysis of data obtained from the community genomics experiment (commonly referred by metagenomics), advocate the need of machine learning techniques and highlight our contributions related to development of supervised and unsupervised techniques for solving this complex, real world problem.

  • Nectar Session: 2 | Room: 103-104 | Time: 14:20 – 14:50, Wednesday 17 Sept Active Learning is Planning: Nonmyopic ε-Bayes-Optimal Active Learning of Gaussian Processes (Trong Nghia Hoang, Bryan Kian Hsiang Low, Patrick Jaillet, Mohan Kankanhalli)

    Authors

    Trong Nghia Hoang, National University of Singapore
    Bryan Kian Hsiang Low, National University of Singapore
    Patrick Jaillet, Massachusetts Institute of Technology
    Mohan Kankanhalli, National University of Singapore

    Abstract

    A fundamental issue in active learning of Gaussian processes is that of the exploration-exploitation trade-off. This paper presents a novel nonmyopic ε-Bayes-optimal active learning (ε-BAL) approach that jointly optimizes the trade-off. In contrast, existing works have primarily developed greedy algorithms or performed exploration and exploitation separately. To perform active learning in real time, we then propose an anytime algorithm based on ε-BAL with performance guarantee and empirically demonstrate using a real-world dataset that, with limited budget, it outperforms the state-of-the-art algorithms.

  • Nectar Session: 2 | Room: 103-104 | Time: 14:50 – 15:20, Wednesday 17 Sept Heterogeneous Stream Processing and Crowdsourcing for Traffic Monitoring: Highlights (Francois Schnitzler, Alexander Artikis, Matthias Weidlich, Ioannis Boutsis, Thomas Liebig, Nico Piatkowski, Christian Bockermann, Katharina Morik, Vana Kalogeraki, Jakub Marecek, Avigdor Gal, Shie Mannor, Dermot Kinane, Dimitrios Gunopulos)

    Authors

    Francois Schnitzler, Technion
    Alexander Artikis
    Matthias Weidlich
    Ioannis Boutsis
    Thomas Liebig
    Nico Piatkowski
    Christian Bockermann
    Katharina Morik
    Vana Kalogeraki
    Jakub Marecek
    Avigdor Gal
    Shie Mannor, Technion
    Dermot Kinane
    Dimitrios Gunopulos, University of Athens

    Abstract

    We give an overview of an intelligent urban traffic management system. Complex events related to congestions are detected from heterogeneous sources involving fixed sensors mounted on intersections and mobile sensors mounted on public transport vehicles. To deal with data veracity, sensor disagreements are resolved by crowdsourcing. To deal with data sparsity, a traffic model offers information in areas with low sensor coverage. We apply the system to a real-world use-case.

  • Nectar Session: 2 | Room: 103-104 | Time: 15:20 – 15:50, Wednesday 17 Sept Distributional Clauses Particle Filter (Davide Nitti, Tinne De Laet, Luc De Raedt)

    Authors

    Davide Nitti, KU Leuven
    Tinne De Laet
    Luc De Raedt, KU Leuven

    Abstract

    We review the Distributional Clauses Particle Filter (DCPF), a statistical relational framework for inference in hybrid domains over time such as vision and robotics. Applications in these domains are challenging for statistical relational learning as they require dealing with continuous distributions and dynamics in real-time. The framework addresses these issues, it supports the online learning of parameters and it was tested in several tracking scenarios with good results.

  • Nectar Session: 3 | Room: 103-104 | Time: 10:10 – 10:40, Thursday 18 Sept Be certain of how-to before mining uncertain data (Francesco Gullo, Giovanni Ponti, Andrea Tagarelli)

    Authors

    Francesco Gullo, Yahoo Labs, Spain
    Giovanni Ponti, ENEA
    Andrea Tagarelli, University of Calabria, Italy

    Abstract

    The purpose of this technical note is to introduce the problems of similarity detection and summarization in uncertain data. We provide the essential arguments that make the problems relevant to the data-mining and machine-learning community, stating major issues and summarizing our contributions in the field. Further challenges and directions of research are also issued.

  • Nectar Session: 3 | Room: 103-104 | Time: 10:40 – 11:00, Thursday 18 Sept Generalized Online Sparse Gaussian Processes with Application to Persistent Mobile Robot Localization (Bryan Kian Hsiang Low, Nuo Xu, Jie Chen, Keng Kiat Lim, Etkin Ozgul)

    Authors

    Bryan Kian Hsiang Low, National University of Singapore
    Nuo Xu, National University of Singapore
    Jie Chen, Singapore-MIT Alliance for Research and Technology
    Keng Kiat Lim, National University of Singapore
    Etkin Ozgul, National University of Singapore

    Abstract

    This paper presents a novel online sparse Gaussian process (GP) approximation method that is capable of achieving constant time and memory (i.e., independent of the size of the data) per time step. We theoretically guarantee its predictive performance to be equivalent to that of a sophisticated offline sparse GP approximation method. We empirically demonstrate the practical feasibility of using our online sparse GP approximation method through a real-world persistent mobile robot localization experiment.

  • Nectar Session: 4 | Room: 103-104 | Time: 14:20 – 14:50, Thursday 18 Sept Network reconstruction for the identification of miRNA:mRNA interaction networks (Gianvito Pio, Michelangelo Ceci, Domenica D’Elia, Donato Malerba)

    Authors

    Gianvito Pio, University of Bari
    Michelangelo Ceci, University of Bari
    Domenica D’Elia, ITB-CNR, Bari
    Donato Malerba, University of Bari

    Abstract

    Network reconstruction from data is a data mining task which is receiving a significant attention due to its applicability in several domains. For example, it can be applied in social network analysis, where the goal is to identify connections among users and, thus, sub-communities. Another example can be found in computational biology, where the goal is to identify previously unknown relationships among biological entities and, thus, relevant interaction networks. Such task is usually solved by adopting methods for link prediction and for the identification of relevant sub-networks. Focusing on the biological domain we proposed two methods for learning to combine the output of several link prediction algorithms and for the identification of biological significant interaction networks involving two important types of RNA molecules, i.e. microRNAs (miRNAs) and messenger RNAs (mRNAs). The relevance of this application comes from the importance of identifying (previously unknown) regulatory and cooperation activities for the understanding of the biological roles of miRNAs and mRNAs. In this paper, we review the contribution given by the combination of the proposed methods for network reconstruction and the solutions we adopt in order to meet specific challenges coming from the specific domain we consider.

  • Nectar Session: 4 | Room: 103-104 | Time: 14:50 – 15:20, Thursday 18 Sept Analyzing and Grounding Social Interaction in Online and Offline Networks (Martin Atzmueller)

    Author

    Martin Atzmueller, University of Kassel

    Abstract

    In social network analysis, there are a variety of options for investigating social interactions. This paper reviews our recent work on analyzing and grounding social interactions in online and offline networks considering distributional semantics, structural network correlation and network inter-dependencies. Specifically, we focus on the analysis of user relatedness, community structure, and relations on online and offline networks. We discuss findings and results that justify the use of even implicitly accruing social interaction networks for the analysis of user-relatedness, community structure, etc. Furthermore, we provide insights into recent work on analyzing and grounding offline social networks.

  • Nectar Session: 4 | Room: 103-104 | Time: 15:20 – 15:50, Thursday 18 Sept Agents Teaching Agents in Reinforcement Learning (Nectar Abstract) (Matthew Taylor, Lisa Torrey)

    Authors

    Matthew Taylor, Washington State University
    Lisa Torrey

    Abstract

    Using reinforcement learning (RL), agents can autonomously learn a control policy to master sequential-decision tasks. Rather than always learning tabula rasa, our recent work considers how an experienced RL agent, the teacher, can help another RL agent, the student, to learn. As a motivating example, consider a household robot that has learned to perform tasks in a household. When the consumer purchases a new robot, she would like the student robot to quickly learn to perform the same tasks as the teacher robot, even if the new robot has different state representation, learning method, or manufacturer. Our goals are to: 1) Allow the student to learn faster with the teacher than without it, 2) Allow the student and teacher to have different learning methods and knowledge representations, 3) Not limit the student’s performance when the teacher is sub-optimal, 4) Not require a complex, shared language, and 5) Limit the amount of communication required between the agents.

X