## Doktoranden-/Diplomandenseminar WS07/08</h2>

This is an archive of the talks in the winter term 2007/2008. The current schedule can be found here.

### Tuesday, 11.03.08 15.00 FR 6046

Daniel Bartz, University of Bremen

Generative models for optical illusions

Optical illusions are not only fascinating, they may give an insight in the way our visual cortex works. Perception is often thought to employ Bayesian perception to cope with the complexity and ambiguity of natural images. The Bayesian approach to optical illusion therefore postulates that they are caused by the visual systems attempt to dissolve ambiguities. Here a Bayesian model is proposed to explain the up to now miraculous "Hermann" and "Scintillating Grid" illusions, attributing the scintillating grids dynamics to a integration effect and inference based on partial information. The model uses cortical receptive fields learned from natural images using sparse Non-negative matrix factorization (NMF).

### Monday, 10.03.08 14.00 FR 6046

Yasemin Altun, Max Planck Institute for Biological Cybernetics, Tübingen

Supervised and semi-supervised learning for structured-output prediction

### Wednesday, 05.03.08 14.00 FR 6046

Mark R. Titchener, Institute of Bioengineering, University of Auckland, New Zealand

Deterministic Entropy: identifying state in complex systems, for biomedical and industrial applications

In this talk I shall present aspects of my research relating to measurement of information (Entropy) in complex systems.

The term Entropy, introduced by Clausius in thermodynamics in the mid 1800's, and adopted by Shannon in the mid 1900's in information theory, is now broadly understood to have meaning in characterising the behavior of complex dynamical systems. I shall briefly touch on the relationship between these areas, giving context to my own area of interest, which I have called Deterministic IT.

In respect of the latter, I shall introduce a novel and readily computable measure of string complexity, defined for individual finite strings. From this a linearised information measure is demonstrated in relation to time-series analysis. This will be illustrated in an application to determining sleep state from recorded EEG/EOG. I shall in the course of my talk also elaboration on my current research interests into visualisation and the development of real-time tools for data analysis, processing and control, including of course Det-entropy tools.

### Thursday, 28.02.08 14.00 FR 6046

David Grangier IDIAP Research Institute, Switzerland

A Discriminative Kernel-based Model to Rank Images from Text Queries

This presentation introduces a discriminative model for the retrieval of pictures from text queries. The core idea of this approach is to minimize a loss directly related to the retrieval performance of the model. For that purpose, we formalize the retrieval task as a ranking problem, and introduce a learning procedure optimizing a loss function related to the ranking performance. This strategy hence addresses the retrieval problem directly and does not rely on an intermediate image annotation task, which contrasts with previous research. Moreover, our learning procedure builds upon recent work on the online learning of kernel- based classifiers. This yields an efficient, scalable algorithm, which can benefit from recent kernels developed for image comparison.

The experiments performed over stock photography data show the advantage of our discriminative ranking approach over state-of-the- art alternatives (e.g. our model yields 26.3% average precision over the Corel dataset, which should be compared to 22.0%, for the best alternative model evaluated). Further analysis of the results shows that our model is especially advantageous over difficult queries such as queries with few relevant pictures or multiple-word queries.

Reference:

A Discriminative Kernel-based Model to Rank Images from Text Queries D. Grangier & S. Bengio - IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). In press. 2008. [ link ]

### Wednesday, 27.02.08, 12.00 FR 6046

Gisbert Schneider University Frankfurt

Applications of Machine Learning in Molecular Design

Machine learning techniques have found a wide field of application in drug discovery. The compilation of focused compound libraries as well as hit and lead identification have been in the focus of drug designers. We will provide a general outline of this domain, report on several of our molecular design projects and demonstrate possibilities and limitations of machine learning methods like feed-forward neural networks, support vector machines, self-organizing maps, and adaptive optimization.

Matthias Rupp University of Frankfurt

Kernel Methods for Ligand-based Virtual Screening

In chemoinformatics, support vector machines have been applied to the problem of ligand-based virtual screening. There, molecules with known binding affinities (dissociation constants) to a given receptor are provided as input, and the task is to predict the binding affinities of molecules in large compound

collections, thereby enabling the selection of a small set of promising candidates for screening in biochemical assays. The used algorithms and kernel are, besides other issues like dataset quality and applicability of domain, important factors for the outcome of a virtual screening study. We introduce a graph kernel designed for the specific requirements of molecular inputs and demonstrate how kernel PCA can be used for visualization and compound filtering in the context of virtual screening.

### Monday, 18.02.08 14.00 FR 6046

Thorsten Dickhaus

False Discovery Rate and Asymptotics

Recently, rapid developments in many scientific fields have lead to new challenges for statistical methodology and practice. Some of nowadays' applications (especially in genetics and related research areas) request appropriate statistical tools for testing many hypotheses simultaneously and, therefore, multiple testing has become one of the most active branches in statistics over the last decade. The classical paradigm to control errors in a multiple test consists in strong control of the family-wise error rate, i.e., the probability of at least one false rejection of a true null hypothesis. However, in screening experiments with explorative character and some ten or even hundred thousands of hypotheses like in genome-wide association studies, this is a much too conservative goal. During the 1990ies, the false discovery rate (FDR) has become a popular alternative concept for such situations. The FDR measures the expected proportion of false rejections among all rejected hypotheses. Due to the massive multiplicity of many applications, asymptotic FDR considerations (the number of hypotheses tends to infinity) become more and more relevant. We study the asymptotic behavior of some FDR-controlling multiple test procedures from the theoretical perspective. First, focus is laid on the popular linear step-up procedure originally introduced by Benjamini and Hochberg in 1995. Since it is well known that this procedure strongly controls the FDR under positive dependency, we investigate its asymptotic conservativeness under such settings. The results show that, depending on the strength of positive dependence and the proportion of true nulls, the FDR can be close to the pre-specified error level or can be very small. Typically, the latter case leads to low power of the linear step-up procedure which raises the possibility for improvements of the algorithm. Some improvements of Benjamini and Hochberg's procedure are presented and discussed. Instead of using critical values increasing linearly (or, in other words, a linear rejection curve), we derive a non-linear and in some sense asymptotically optimal rejection curve leading to full exhaustion of the FDR level under some extreme parameter configurations. This curve is then implemented into some stepwise multiple test procedures which control the FDR asymptotically or (with slight modifications) for a finite number of hypotheses. It turns out that the new curve leads (in general) to more powerful test procedures. Besides all these theoretical and methodological topics, we are also concerned with some practical aspects of FDR. We apply FDR-controlling procedures to real life data and illustrate the functionality, assets and drawbacks of different test methods using these data sets.

References:

• Dickhaus, T. (2008). False Discovery Rate and Asymptotics, Ph. D. dissertation
• Finner, H., Dickhaus, T., Roters, M. (2007). Dependency and False Discovery Rates. Ann. Statist. 35, No. 4, 1432-1455.
• Finner, H., Dickhaus, T., Roters, M. (2008). On the False Discovery Rate and an Asymptotically Optimal Rejection Curve. Ann. Statist., to appear.

### Friday, 08.02.08 11.00 FR 6046

Gernot Supp UKE Hamburg-Eppendorf, Department of Neurophysiology and Pathophysiology

Assessing directionality of neural communication from macroscopic human brain signals

nduced gamma-band-responses (iGBRs; oscillations >30Hz) were repeatedly found to be modulated by familiar and unfamiliar objects in macroscopic human brain signals (MEG/EEG). This frequency-specific power change seems to indicate the dynamic formation of local neuronal assemblies during the activation of cortical object representations. As analytically power increase merely reflects the neural response of a single location, phase-synchrony was introduced to investigate the formation of large-scale networks between spatially distant brain sites. However, if we are interested in how distinct brain sites interact (to uncover the temporal hierarchy between them), classical phase-synchrony does not suffice. The current report will be focused on how the directionality of oscillatory interactions between brain sources can be assessed on the basis of a multivariate Granger-Causality coupling-measure. Specifically, this report should illustrate that uncovering the directionality of cortical information flow (feed-forward/-backward) may shed new light on cortical mechanisms of human object recognition

### Wednesday, 06.02.08 14.00 FR 3002 (!!)

Timon Schroeter

Loss functions for rankings & individual predictions applied to probabilistic model outputs

How do loss functions for rankings relate to loss functions for individual predictions? How much sense does it make to use either type of loss function with outputs of probabilistic models? These questions are at least partly philosophical. I will share how we handeled things in the past & what I think makes sense to do in the future. I am curious which further ideas the audience will have. Furthermore, I will present results of methodological developmnents & give an outlook on what's going to happen in the Chem ML DFG Project.

### Wednesday, 30.01.08 14.00 FR 6046

Guido Nolte

Studying brain connectivity with EEG/MEG using time delays

Studying brain connectivity form EEG/MEG data is extremely difficult because it is impossible to uniquely solve the 'inverse problem', i.e. to calculate the brain activity at specified brain locations from the measurement of external electric potentials and magnetic fields. As a consequence, an estimate of brain activity corresponds to an incomplete demixing of channel signals. It is then unavoidable that different brain sites appear to be connected even if all brain sources were truly independent.

To overcome this problem it has been suggested to exploit the fact that the mapping of brain sources to external channels is essentially instantaneous while brain interactions involve measurable time delays. In this talk I will give an overview of our recent work based on this concept.

Specifically, this talk has the topis:

• Role of imaginary part of cross-spectra/coherency for studying true interaction as opposed to 'artefacts of volume conduction'.
• Separation of interacting subsystems, which are mutually independent, using "Pairwise Interacting Source Analysis" (PISA).
• Localization of interactions.
• Estimating the direction of information flux robust to correlated background noise.

### Wednesday, 23.01.08 14.00 FR 6046

NIPS Travelers

Impressions from NIPS Part II

### Wednesday, 16.01.08 14.00 FR 6046

Pascal Lehwark

Trend Detection in Social Networks

In recent years, there has been an explosive growth of so-called community sites (social networks). Often summarized under the Web 2.0-label, these sites, which usually focus on a specific kind of multimedia data like pictures, or videos, only provide the infrastructure and rely on the users to provide the actual content.

Social networks are complex dynamic systems - their functionality includes communication, content-creation, -modification and - consumption, finding friends with similar interests and tagging media by providing a number of keywords. Tags have been proven to provide an efficient way to semantically organize and explore the huge amount of media in social networks.

The data which comes into existence in this context is often publicly available and states a highly interesting source for data mining. In particular the detection, characterization and prediction of trends within social networks - i.e. the significant strong focus in a certain time-period on media, users or topics is one of the most fascinating challenges.

In this work, we will undertake a first step in this direction for the social music community Last.fm.

### Wednesday, 09.01.08 14.00 place tbc

NIPS Travelers

Impressions from NIPS

### Monday, 07.01.08 11.00 FR 6046

Martijn Schreuder, Vrije Universiteit Amsterdam

title tba

Several rat models for human temporal lobe epilepsy exist. These models are either chemical or electrical (overstimulation) of nature. They result in a severe, initial status epilepticus (SE) followed by a latent period of typically 2-4 weeks. During the latent period, rats show no overt spontaneous seizures. After this latent period spontaneous seizures start occurring in most rats. Histochemical evidence shows that directly after SE there is extensive loss of primary in the hippocampal region, including in the superficial layers of the Medial Entorhinal Area (MEA). During this initial phase, the Lateral Entorhinal Area is relatively spared. However, as spontaneous seizures appear, cell loss has progressed into the LEA. Several pathways can be hypothesized to be responsible for the progression of cell loss from the MEA to the LEA. Two of these are the intrinsic entorhinal connections that exist between MEA and LEA, and the entorhinal-hippocampal-entorhinal connections. We investigate the possibility of explaining this progression of cell death from the medial entorhinal cortex to the lateral entorhinal cortex by simulating a simplified connectivity scheme in a biologically informed large-scale network of simple Mac Gregor Integrate-and-Fire (IAF) neurons.

### Wednesday, 19.12.07 14.00 FR 6046

Roman Rosipal, Institute of Medical Cybernetics and Artifical Intelligence, Center for Brain Research, Medical University of Vienna, Austria

A new way of the sleep-wakefulness process analysis: modeling the process as a continuum

A continuous, entirely probabilistic modeling of the all night sleep and daytime sleepiness processes will be discussed. The focus of the talk is to describe sleep and wakefulness as a continuum process in contrast to the standard discrete Rechtschaffen & Kales (R&K) scoring system generally applied in the sleep research community. The outcome of this modeling effort is additional information about sleep quality, pathologies and other clinically relevant aspects not obtained by the R&K scoring. First, the new sleep modeling approach will be described, examples of the model presented and results compared with the R&K method. Instead of the strict sleep stages comparison both approaches will be compared based on their ability to explain a set of subjective and objective psychometric variables obtained in the form of questioners and tests carried out after sleep. Next, an EEG-based probabilistic model, which effectively predicts involuntary transitions from wakefulness to sleep of subjects participating in a moving base driving simulator and vigilance test experiments, will be described. Promising pilot results will be presented. Finally, future steps involving the principled probabilistic inclusion of other physiological signals and contextual information will be discussed.

### Tuesday, 18.12.07, 14.00, FR 6046

Stefan Haufe

Source Localization

The problem of localizing the generators of brain activity from EEG/MEG measurements is ill-posed, as infinitely many source configurations are capable of explaining a measured potential. By modeling the sources as a regular grid of dipolar currents, one obtains a linear forward-mapping from the sources to the sensors. As there are by far more sources than sensors, the backward system is heavily underdetermined and cannot be solved for the sources. Prior preference has to be incorporated in order to abtain a unique and neurophysiologically interpretable solution. Leading methods are based on minimizing the (weighted) $\ell_2$- or $\ell_1$-norm of the current density. Their estimated activity regions are commonly considered to be either too broad ($\ell_2$) or too narrow ($\ell_1$) to be plausible. Minimum $\ell_1$-norm solutions, for instance, are sparse, i.e. they consist of only a few disconnected spikes. In this talk we present an $\ell_1$-norm based approach that overcomes the incontinuity problem and yields \emph{focal} solutions beeing composed of as few as possible nonzero patches. We show a scenario with two dipolar sources, which are both perfectly localized by our method. In this simple and noiseless setting the competing methods already fail in indicating the exact number of sources.

### Wednesday, 12.12.07, 14.00 place tbc

Soeren Sonnenburg

Multiple Kernel Learning

While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lankriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constrained quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimental results show that the proposed algorithm works for hundred thousands of examples or hundreds of kernels to be combined, and helps for automatic model selection, improving the interpretability of the learning result and is implemented in the freely available shogun toolbox.

### Tuesday, 20.11.07, 15.00, FR 6046

Julia Lasserre, University of Cambridge, United Kingdom

Hybrids of generative and discriminative methods

Until recently, most machine learning techniques have been using generative or discriminative methods. Because both approaches are valid but good for different goals (modeling versus classifying), a few authours have started to investigate the effect of blending the two. It has been shown that for some problems, it is more promising than purely generative or discriminative methods. In this talk, we will suty hybrid models that can help deal with the issues of few labeled data, of few computing resources.

### Tuesday, 13.11.07, 11.00, FR 6046

Nina Tahmasebi

Isolating Subgraph Algorithm: An approach to Personalized Search Engines

As the size of the web grows the need for good search engines increase. Good information differs between users and the task of a search engine becomes knowing what each specific user is interested in. This is done by modeling the user and offering each user a personalized ranking vector.

Even assuming that we already know a good way of modeling the user preferences, the task of finding a ranking vector that respects the set of preferred nodes is not trivial. With over 20 billion nodes and up to 40 times as many links in the web graph any change in a ranking vector will most probably affect the ranking of each node. This makes computations extremely heavy and therefore we still lack personalized search engines that produce good results.

I will propose an algorithm for isolating a part of the web graph in order to make any updates on that particular subgraph without affecting the rest of the graph. This will reduce the number of nodes that are affected and ease computations significantly.

### Wednesday, 07.11.07, 14.00, FR 6046

Greedy Kernels for Labeled Trees and Forests

Learning with structured data requires expressive but also efficient means for comparison of objects. We propose a family of efficient kernel functions for labeled trees and forests. In contrast to classic tree kernels, discovery of matching subtrees is carried out using a greedy traversal scheme, which enables linear run-time complexity of kernel computation. Experiments with learning applications of natural language processing and network security demonstrate the efficiency of the proposed greedy tree kernels.

### Wednesday, 31.10.07, 14.00, FR 6046

Marius Kloft

A "Poisoning" Attack Against Online Anomaly Detection

Online anomaly detection techniques are steadily gaining attention in the security community, as the need grows to identify novel exploits in highly non-stationary data streams. The primary goal of online anomaly detection is to dynamically adjust the concept of normality while still detecting anomalous behavior. The question arises whether it is robust against targeted "poisoning" attacks.

### Wednesday, 24.10.07, 14.00, TU Berlin FR6046

Jakob Uszkoreit

Distributed Word Clustering for Class-Based Language Modeling in Machine Translation

Statistical language models are used in a variety of applications such as speech recognition, information retrieval and statistical machine translation, with n-gram language models being the most widely used type of model. Despite the ability to use ever growing training corpora, n-gram language models still suffer from data sparsity. Class-based language models are a widely used technique to address this problem by grouping words into equivalence classes. However, the clustering algorithms usually used to obtain such equivalence classes have prohibitive runtimes when considering the size of current training corpora and their vocabularies. We present a distributed algorithm for obtaining such equivalence classes for millions of words occurring in billions of tokens of training data. We then evaluate partially class-based n-gram models trained using the obtained classifications in a statistical machine translation system.

### '''Tuesday''', 16.10.07, '''11.00''', TU Berlin, FR 6046

Koji Tsuda, Max Planck Institute for Biological Cybernetics, Tübingen, Germany

Mining complex genotypic features for predicting HIV-1 drug resistance

Human immunodeficiency virus type 1 (HIV-1) evolves in human body, and its exposure to a drug often causes mutations that enhance the resistance against the drug. To design an effective pharmacotherapy for an individual patient, it is important to accurately predict the drug resistance based on genotype data. Notably, the resistance is not just the simple sum of the effects of all mutations. Structural biological studies suggest that the association of mutations is crucial: Even if mutations A or B alone do not affect the resistance, a significant change might happen when the two mutations occur together. Linear regression methods cannot take the associations into account, while decision tree methods can reveal only limited associations. Kernel methods and neural networks implicitly use all possible associations for prediction, but cannot select salient associations Our method, itemset boosting, performs linear regression in the complete space of power sets of mutations. It implements a forward feature selection procedure where, in each iteration, one mutation combination is found by an efficient branch-and-bound search. This method uses all possible combinations, and salient associations are explicitly shown. In experiments, our method worked particularly well for predicting the resistance of nucleotide reverse transcriptase inhibitors (NRTIs). Furthermore, it successfully recovered many mutation associations known in biological literature.

-- Nicole Kraemer - 26 Mar 2008