Medical data mining is a mature area of research, characterized by both simple and very elaborate methods, mostly dedicated to solving a concrete problem of disease diagnosis, disease description or success prediction for a treatment. Clinical knowledge discovery encompasses analysis of epidemiological data, and of clinical and administrative data on patients; clinical decision support builds upon findings on these data. We elaborate on how data mining can contribute to such findings, we enumerate challenges of model learning, data availability and data provenance, and identify challenges on Big Medical Data. Outline
AffiliationsPedro Pereira Rodrigues |
In this tutorial, we will consider generalizations of closed itemset mining toward n-ary relations and toward noise tolerance. Declarative aspects (in particular, how to define “noise”) as well as procedural aspects (how to efficiently traverse the pattern space) will be discussed. Presentation of the tutorial T2 Source Beamer LaTeX tutorial T2 Outline
Affiliations |
Biology has become an enormously data-rich subject. Data is generated in many flavors and follows particularities of the omics perspective adopted along experimental studies. For instance, genomics is the field of study dealing with genomes and it is mostly associated with the static view (the genes and where they are placed along the genome). The dynamic view is brought from the transcriptomics perspective, so the gene expression and its regulation. Finally, interactomics is usually associated to gene products, proteins, and their interactions. However it could also be seen as a huge graph network with layers of interaction integrating distinct omics perspectives. Omics science applications of unsupervised and/or supervised machine learning (ML) techniques abound in the literature. In this tutorial, we discuss machine learning on omics data, putting the emphasis on (i) mapping and (ii) learning omics patterns. We consider three main omics data: genomics, transcriptomics and interactomics. For each perspective, we first provide, the biological problem, the data mapping (from a biological problem to a machine learning problem), the core ML methods employed and its implementation in the R language. Presentation of the tutorial T3 Outline
Affiliations |
Reliable estimation of confidence remains a significant challenge as learning algorithms proliferate into challenging real world pattern recognition applications. The Conformal Predictions framework is a recent development in machine learning to associate reliable measures of confidence with results in classification and regression. This framework is founded on the principles of algorithmic randomness, transductive inference and hypothesis testing, and has several desirable properties for potential use in various real world applications, such as the calibration of the obtained confidence values in an online setting. Further, this framework can be applied across all existing classification and regression methods (such as neural networks, Support Vector Machines, ridge regression, etc), thus making it a very generalizable approach. Over the last few years, there has been a growing interest in applying this framework to real world problems such as clinical decision support, medical diagnosis, sea surveillance, network traffic classification, and face recognition. Presentation of the tutorial PDF Presentation of the tutorial PTT Outline
Affiliations |
Model techniques are becoming increasingly popular in many diverse data mining subfields such as sequence mining, graph mining, and pattern mining. One particularly popular approach, due to its interpretability and practicality, is Minimum Description Length (MDL) principle which is based on information-theoretic approach. In this tutorial we present basic concepts of MDL, Information Theory, and Bayesian Statistics with the emphasis on how they are connected, and what are the consequences of these connections. These connections provide additional insights into MDL principle and information theory, provide a stronger theoretical background, and allow us to use tools from statistics, but also point out limitations that are not immediately apparent. Outline
Affiliations |
Selecting a model for a given set of data is at the heart of what data analysts do, whether they are statisticians, machine learners or data miners. However, the philosopher Hum already pointed out that the ‘Problem of Induction’ is unsolvable; there are infinitely many functions that touch any finite set of points. So, it is not surprising that there are many different principled approaches to guide the search for a good model. Well-known examples are Bayesian Statistics and Statistical Learning Theory. Outline
Affiliations |
Reasoning by analogy has been recognized as a major cognitive capability of human mind, and studied in AI, among other fields. In the last decade, there has been a renewal of interest around the notion of analogical proportion, i.e., statements of the form “a is to b as c is to d”. Formal models of analogical proportions have been proposed in various settings including sets, lattices, trees, etc. In logical terms, analogical proportion states that “a differs from b as c differs from d” and vice-versa. This shows that analogy making is both a matter of similarity and dissimilarity. Analogical proportions provide a symbolic counterpart to numerical proportions. Instead of dealing exclusively with numbers, analogical proportions transpose the “rule of three’’ to symbolic items, allowing to induce a 4th item when only the 3 others are known. This is the core of analogical-based learning methods. Its interest relies on the “creative” nature of the process which looks at similar items (as in the neighborhood-based methods), but takes also advantage of dissimilar, but “parallel” cases. The aim of this tutorial is to provide the audience with :
Presentation of the tutorial T7 Outline
Affiliations |
We will start with an overview of the various preference learning problems which have emerged these past years, including instance ranking and label ranking. We will see how these problems can be formulated as (possibly convex) optimization problems, or reduced to other well-known machine learning problems. Then, we will discuss about the main preference models and how to learn them. In particular, we will first introduce ordinal preference models, including CP-nets and lexicographic preference networks, and then discuss about utility-based models such as generalized additive independence (GAI) networks. Finally, to broaden the talk, we will mention how preference learning may be used in other setting such as Markov Decision Processes or Computational Social Choice. Outline
Affiliations |
Deep learning is one of the most rapidly growing areas of machine learning. It concerns the learning of multiple layers of representation that gradually transform the input into a form where a given task can be performed more effectively. Deep learning has recently been responsible for an impressive number of state-of-the-art results in a wide array of domains, including object detection and recognition, speech recognition, natural language processing tasks, bio-informatics and r einforcement learning. Presentation of the tutorial T9 OutlineIn this tutorial we will cover the foundations of deep learning: neural networks, convolutional neural networks, recurrent neural networks, autoencoders and Boltzmann machines. We will discuss why models with many layers of representation can be hard to learn and present strategies that have been developed to overcome these challenges. We will also discuss more recent innovations including dropout training that has proved to be an extremely effective regularization technique for training neural networks. Finally, we will cover some concrete and successful applications of deep learning. Affiliations |