I'm currently an Associate Professor at the Computer Science Department from Universidad Católica de Chile and Research Assistant at the Institute of Applied Computational Science (IACS) from Harvard University. I´m also a researcher in the Millenium institute of Astrophysics (MAS). I received a Ph. D. degree in Computer Science from Universidad Católica in Chile, 2010. During 2011-2012, I did a postdoc in Machine Learning for Astronomy, at Harvard University. My main research areas are Data Science and Machine Learning for Astronomy, focusing in the development of several new tools for automatic classification of variable stars, detection of quasars, discovery of known objects, dealing with missing data, and meta classification, among others.
Research Interest: Data Mining, Machine Learning, Astro-Statistics and Astro-Informatics
firstname.lastname@example.org - email@example.com
Approximate Bayesian Inference
Probabilistic Graphical Models
Advanced Python Programming
Plenary talk at ACAT 2016
Harvard-Chile students exchange program started
Presentation at the Semantic Web Seminar
Our research is focused on Machine Learning and Data Science, mainly applied to the analysis of Astronomical Time Series. Our main interest is to develop automatic tools to classify objects in the Universe, based on information provided by telescopes. There are several important challenges to overcome, such as huge data processing algorithms, parallel programming, non structured and multivariate time series representation, intelligent integration of expert models, dealing with missing data, unsupervised representation, among others. Chile is one of the most attractive places to perform scientist research in Astronomy, given that most of the state of the art telescopes are installed in the North of the Country.
A Global Data Warehouse for Astronomy
This project is leaded by Javier Machin (Master Student). Javier is developing a Data Warehouse for Astronomy (today just for time domain astronomy), where we get together several (soon will be most of them) astronomical catalogs (OGLE, VVV, MACHO, EROS, Catalina Survey, Stripe 82, etc.), preprocessed, integrated, with visualization tools, visual querying tools, selection and integration tools, slicing and dicing capabilities, data sharing, machine learning models and lightcurve parameters extraction (FATS), among others. This project has been developed in collaboration with Professor Andrés Neyém.
Automatic Survey-Invariant Classification of Variable Stars
This project has been developed by Patricio Benavente (Master Student), in collaboration with Professor Pavlos Protopapas. Lightcurve datasets have been available to astronomers long enough, thus allowing them to perform deep analysis over several variable sources and generating useful catalogs with identified variable stars. The products of these studies are labeled data that enable supervised learning models to be trained successfully. However, when these models are blindly applied to data from new sky surveys their performance drops significantly. Furthermore, unlabeled data becomes available at a much higher rate than its labeled counterpart, since labeling is a manual and time-consuming effort. Domain adaptation techniques aim to learn from a domain where labeled data is available, the source domain, and through some adaptation perform well on a different domain, the target domain. We propose a full probabilistic model that represents the joint distribution of features from two surveys as well as a probabilistic transformation of the features between one survey to the other.
Time Series Variability Tree for fast light curve retrieval
This project has been developed by Lucas Valenzuela (Master Student). We propose a new algorithm and data structure to index astronomical lightcurves in order to provide astronomer with a fast lightcurve search in catalogs of millions of objects. We believe that this tool is very useful to perform manual exploration of catalogs, where astronomers need to find certain type of lightcurves, similar to some lightcurve they already have. Also, classification algorithms based on similarities can rely on this model, where instead of using traditional search methods they can use this indexing procedure and perform much faster.
A full probabilistic model for yes/no query type Crowdsourcing
This project has been developed by Belén Saldías (Master Student), in collaboration with Professor Pavlos Protopapas. Crowdsourcing has become widely used in supervised scenarios where unlabeled data is abundant and labeled data is hard to obtain and scarce. Despite there are several crowdsourcing models in the literature, most of them assume annotators can provide answers for standard complete questions. In classification contexts, complete questions mean that an annotator is asked to discern among all the possible classes. Unfortunately, that assumption is not always true in realistic scenarios. In this work we provide a full probabilistic model for a new type of queries where instead of asking complete questions to the labelers, we just ask queries that require a "yes" or "no" response.
Visualization tool for relevant patterns in Light Curves
This project has been developed by Christian Pieringer, in collaboration with Professors Márcio Catelán and Pavlor Protopapas. Current Machine Learning methods for automatic lightcurves classification have shown successful results, reaching high classification performance in known catalogs. Recently, visualization of time series is attracting more attention in machine learning as a tool to visually help experts for recognizing significant patterns in the complex dynamics of the astronomical time series. Inspired in dictionary-based classifiers, we present a method that naturally provides the visualization of salient parts on light curves. These classifiers code the relevant parts intrinsically assigning weights to each word in the dictionary according to their contribution in the signal approximation. Our approach delivers an intuitive visualization according to the relevance of each part in the time series. Results suggest the effectiveness of this method to highlight salient patterns. We also propose a pipeline that uses our method for observational time scheduling.