Book on ViCoS Lab

Avtomatsko modeliranje 3-dimenzionalnih veèbarvnih predmetov z uporabo globinskega senzorja

Mon, 01 Jan 0001 00:00:00 +0000

V magistrski nalogi je opisan postopek avtomatske gradnje 3-D modelov resniènih predmetov iz globinskih slik. Za zajemanje globinskih slik uporabljamo globinski senzor, ki deluje na principu aktivne triangulacije z uporabo kodirane svetlobe. Razvili smo nov pristop za gradnjo slik z dinamiènim obsegom intenzitetnih vrednosti iz veèjega števila navadnih intenzitetnih slik, posnetih ob razliènih osvetljenostih. Z uporabo slik z dinamiènim intenzitetnim obsegom lahko uspešno izraèunamo tudi globinske slike predmetov z nehomogenimi odbojnimi lastnostmi. Iz globinske slike nato izraèunamo 3-D koordinate toèk, ki ležijo na površini predmeta in so vidne iz zornega kota kamere. S triangulacijo površine med dobljenimi toèkami zgradimo 2.5-D model, ki ga nato poenostavimo, za bolj realistièen izgled pa nanj nalepimo teksturo. Na koncu še združimo veè 2.5-D modelov istega predmeta v enoten 3-D model.

Categorial Perception

Mon, 01 Jan 0001 00:00:00 +0000

Context Driven Focus of Attention for Object Detection

Mon, 01 Jan 0001 00:00:00 +0000

Context plays an important role in general scene perception. In particular, it can provide cues about an object’s location within an image. In computer vision, object detectors typically ignore this information. We tackle this problem by presenting a concept of how to extract and learn contextual information from examples. This context is then used to calculate a focus of attention, that represents a prior for object detection. State-of-the-art local appearance-based object detection methods are then applied on selected parts of the image only. We demonstrate the performance of this approach on the task of pedestrian detection in urban scenes using a demanding image database. Results show that context awareness provides complementary information over pure local appearance-based processing. In addition, it cuts down the search complexity and increases the robustness of object detection.

Cross-modal learning

Mon, 01 Jan 0001 00:00:00 +0000

Cross-modal learning refers to any kind of learning that involves information obtained from more than one modality. In the literature the term modality typically refers to a sensory modality, also known as stimulus modality. A stimulus modality provides information obtained from a particular sensorial input, for example visual, auditory, olfactory, or kinesthetic information. Examples from artificial cognitive systems (“robots”) include also information about detected range (by sonar or laser range-finders), movement (by odometry sensors), or motor state (by proprioceptive sensors). We adopt here the notion of modality that includes both the sensorial data, and further interpretations of that data within the modality. For example, from a pair of (depth-calibrated) images, a cloud of points in 3-dimensional space can be computed. We obtain both types of data (the image data, and the 3D points) from the same visual sensor. At the same time, they differ in what information they provide. We consider information sources derived from sensorial data as derived modalities that by themselves can be involved again in cross-modal learning.

Integrating Visual Context and Object Detection within a Probabilistic Framework

Mon, 01 Jan 0001 00:00:00 +0000

Visual context provides cues about an object’s presence, position and size within an observed scene, which are used to increase the performance of object detection techniques. However, state-of-the-art methods for context aware object detection could decrease the initial performance. We discuss the reasons for failure and propose a concept that overcomes these limitations, by introducing a novel technique for integrating visual context and object detection. Therefore, we apply the prior probability function of an object detector, that maps the detector’s output to probabilities. Together, with an appropriate contextual weighting, a probabilistic framework is established. In addition, we present an extension to state-of-the-art methods to learn scale-dependent visual context information and show how this increases the initial performance. The standard methods and our proposed extensions are compared on a novel, demanding image data set. Results show that visual context facilitates object detection methods.

Learning Hierarchical Compositional Representations of Object Structure

Mon, 01 Jan 0001 00:00:00 +0000

Learning hierarchical representations of object categories for robot vision

Mon, 01 Jan 0001 00:00:00 +0000

This paper presents our recently developed approach to constructing a hierarchical representation of visual input that aims to enable recognition and detection of a large number of object categories. Inspired by the principles of efficient indexing, robust matching, and ideas of compositionality, our approach learns a hierarchy of spatially flexible compositions, i.e. parts, in an unsupervised, statistics-driven manner. Starting with simple, frequent features, we learn the statistically most significant compositions (parts composed of parts), which consequently define the next layer. Parts are learned sequentially, layer after layer, optimally adjusting to the visual data. Lower layers are learned in a category-independent way to obtain complex, yet sharable visual building blocks, which is a crucial step towards a scalable representation. Higher layers of the hierarchy, on the other hand, are constructed by using specific categories, achieving a category representation with a small number of highly generalizable parts that gained their structural flexibility through composition within the hierarchy. Built in this way, new categories can be efficiently and continuously added to the system by adding a small number of parts only in the higher layers. The approach is demonstrated on a large collection of images and a variety of object categories.

Robust subspace approaches to visual learning and recognition

Mon, 01 Jan 0001 00:00:00 +0000

In the real world, visual learning is supposed to be a robust and continuous process. All available visual data is not equally important; in the case of occlusions or other undesirable intrusions in the field of view some visual data can even be misleading. Human visual system treats visual data selectively and builds efficient representations of observed objects and scenes even in non-ideal conditions. Furthermore, these representations can afterwards be updated with newly acquired information, thus adapting to the changing world. In this dissertation we study these premises and propose several methods, which introduce similar principles in the machine visual learning and recognition as well. We approach visual learning by the appearance-based modeling of objects and scenes. Models are built using principal component analysis (PCA), which has several shortcomings with respect to the premises mentioned above. In order to overcome these shortcomings, we propose several extensions of the standard PCA. PCA-based learning is traditionally performed in a batch mode, thus requiring all training images to be given in advance. Since this is not admissible in the framework of continuous learning, we propose an incremental method, which processes images sequentially one by one and updates the representation at each step accordingly. Each image can be discarded immediately after the model is updated, which makes the method perfectly well suited for real on-line scenarios. In addition, in the standard PCA approach all pixels of an image receive equal treatment. Also, all training images have equal influence on the estimation of principal subspace. In this dissertation, we present a generalized PCA approach, which estimates principal axes and principal components considering weighted pixels and images. We further extend this weighted approach into a method for learning from incomplete data, which builds the model of an object even when the part of input data is missing. Images of objects and scenes are not always ideal and as such they may contain various deceptive additions like reflections or occlusions. PCA in its standard form is intrinsically non-robust to such non-gaussian noise. Several methods for robust recognition have already been proposed, however robust learning has been tackled very rarely. In the dissertation we introduce a novel approach to the robust subspace learning. The proposed batch and incremental methods detect inconsistencies in the training images and build the representations from consistent data only. As a result, the obtained models are more robust and efficient enabling more reliable visual learning and recognition even when the learning conditions are not ideal. In the dissertation we derive all the methods mentioned above and present suitable algorithms. We also experimentally evaluate all the proposed algorithms on different image domains and determine the applicability of the methods in different scenarios.

Sekvenčne Monte Carlo metode za sledenje oseb v računalniškem vidu

Mon, 01 Jan 0001 00:00:00 +0000

People tracking is a part of a broad domain of computer vision, that has received a great attention from researchers over the last twenty years. An interesting aspect of the problem of tracking originates from the field of control theory and considers the object being tracked as a dynamical system with a hidden state, of which only the current measurements are available and observed. The classical methods that were used in the past to tackle this problem employed Kalman filters and their derivatives. These generally assume a Gaussian linear dynamical and measurement model, assumptions, which are usually too restrictive for the majority of natural processes. In the late 90’s, the advances in the sequential Monte Carlo methods on various fields of science gave rise to a family of methods that effectively deal with problems of this kind. Their main advantage over the Kalman filter is that they do not impose as restrictive assumptions and can be relatively easily implemented. In computer vision, the sequential Monte Carlo methods, also known as particle filters, became extremely popular with the introduction of the Condensation algorithm. Since then, a body of literature has been published regarding these methods. This thesis is dedicated to the problem of tracking people by means of sequential Monte Carlo methods, application of which is demonstrated on a system for tracking players in team sports. We first consider the problem of tracking in the context of statistical estimation and present the main parts of the Monte Carlo solutions. The well known Condensation algorithm, which comprises the central part of all the trackers presented here, is introduced as a sequential Monte Carlo method and a simple algorithm to track one player is presented. By considering a team sport in the context of a closed world, a set of assumptions that depicts a typical match is derived. Following these assumptions, a more robust single-player tracker is developed and then extended to the case of multiple players. Finally, two variants of trackers for tracking multiple players in the closed worlds are presented. A number of experiments are reported to evaluate the performance of the trackers and based on the results, the most suitable multi-player tracker is chosen. We also point out some guidelines for future development of the application for tracking multiple players.

Sledenje objektov v robotskem nogometu

Mon, 01 Jan 0001 00:00:00 +0000

Robotski nogomet je visoko tehnološki šport, ki ga je leta 1995 na korejskem tehnološkem inštitutu razvil profesor Jong-Hwan Kim kot večnamensko okolje za učenje in testiranje aplikacij analize slik, umetne inteligence, senzorjev, komunikacij itd. V zadnjih osmih letih je robotski nogomet doživel velik razmah tako v sklopu zabavne elektronike kot v sklopu testiranja in razvoja novih tehnologij. Danes obstajata dve mednarodni zvezi robotskega nogometa in sicer Robocup in FIRA (Federation of International Robot Association). Vsaka izmed obeh zvez organizira ločena tekmovanja v različnih kategorijah, kategorija pa določa lastnosti izvedbe tekme in sicer od čistih simulacij na računalniku preko mikrorobotov do humanoidnih robotov. Na Fakulteti za Elektrotehniko v Ljubljani so se z robotskim nogometom kategorije MiroSot pričeli ukvarjati leta 2000, in ga poimenovali Robobrc. Robobrc deluje v dveh različicah kategorije MiroSot, ki se razlikujeta le v številu igralcev in dimenzijah igrišča. V prvi različici vključuje vsak tim po tri igralce (igra treh igralcev ali mala liga), v drugi različici pa po pet (igra petih igralcev ali srednja liga). S prehodom iz igre treh na igro petih igralcev se je pojavila potreba po učinkovitem sledilniku, ki bi ločil večje število barv in sledil desetim robotkom in žogici v realnem času. V diplomski nalogi je obravnavana aplikacija sledenja robotkov pri tekmah Robobrca, zato bomo najprej v uvodu predstavili le tista pravila obeh različic igre, ki so pomembna za nadzorni sistem računalniškega vida. V nadaljevanju bomo predstavili če področje računalniškega vida in sledenja objektov, kjer bomo podali kratek pregled literature o sledenju v športu in robotskem nogometu.

Tracking people in video data using probabilistic models

Mon, 01 Jan 0001 00:00:00 +0000