Econometrics and Statistics Seminar
Time and place: Thursdays, 14:00-15:00 h in the Faculty Lounge (Room 0.036), Juridicum, Adenauerallee 24-42, 53113 Bonn
October 25, 2018 Andreas Dzemski, University of Gothenburg
Title: "Confidence set for group membership" (together with Ryo Okui)
Abstract: We develop new procedures to quantify the statistical uncertainty from sorting units in panel data into groups using data-driven clustering algorithms. In our setting, each unit belongs to one of a finite number of latent groups and its regression curve is determined by which group it belongs to. Our main contribution is a new joint confidence set for group membership. Each element of the joint confidence set is a vector of possible group assignments for all units. The vector of true group memberships is contained in the confidence set with a pre-specified probability. The confidence set inverts a test for group membership. This test exploits a characterization of the true group memberships by a system of moment inequalities. Our procedure solves a high-dimensional one-sided testing problem and tests group membership simultaneously for all units. We also propose a procedure for identifying units for which group membership is obviously determined. These units can be ignored when computing critical values. We justify the joint confidence set under N,T→∞ asymptotics where we allow T to be much smaller than N. Our arguments rely on the theory of self-normalized sums and high-dimensional central limit theorems. We contribute new theoretical results for testing problems with a large number of moment inequalities, including an anti-concentration inequality for the quasi-likelihood ratio (QLR) statistic. Monte Carlo results indicate that our confidence set has adequate coverage and is informative. We illustrate the practical relevance of our confidence set in two applications.
November 22, 2018 Anna Simoni, Crest Paris
Title: Bayesian Estimation and Comparison of Conditional Moment Models
Abstract: In this paper we consider models characterized by conditional moment conditions and construct a semiparametric Bayesian inference for them. Our procedure utilizes and completes the Bayesian exponentially tilted empirical likelihood (BETEL) framework developed in Chib, Shin and Simoni (2018) for unconditional moment condition models. The starting point is a conversion of the conditional moments into a sequence of unconditional moments by using a vector of approximating functions (such as tensor splines based on the splines of each conditioning variable) with dimension that is increasing with the sample size. We establish that the BETEL posterior distribution satisfies the Bernstein-von Mises theorem, subject to a rate condition on the number of approximating functions. We also develop an approach based on marginal likelihoods and posterior odds ratios for comparing different conditional moment restricted models and establish the model selection consistency of this procedure. Unlike the set up in Chib, Shin and Simoni (2018), the model selection theory is different because the extra parameter that is needed for validly comparing such models has dimension that grows with the sample size, and, therefore, the rate of contraction of the posterior distribution is nonparametric. We treat both the cases where the models to be compared are nested and the case where they are non-nested. We establish that when we compare correctly specified models, the marginal likelihood criterion selects the model that is estimable at the faster rate. In the nested case, the model selected by the posterior odds criterion is the model that is estimable at the parametric rate. When we compare misspecified models we select the model that is less misspecified, that is, the model that contains the smaller number of misspecified moment restrictions. This theory breaks substantial new ground in the area of Bayesian model comparisons. Several examples are used to illustrate the framework and results.
January 10, 2019 Christophe Ley, Ghent University
Title: A hybrid random forest to predict soccer matches in international tournaments
Abstract: We propose a new hybrid modeling approach for the scores of international soccer matches which combines random forests with Poisson ranking methods. While the random forest is based on the competing teams’ covariate information, the latter method estimates ability parameters on historical match data that adequately reflect the current strength of the teams. We compare the new hybrid random forest model to its separate building blocks as well as to conventional Poisson regression models with regard to their predictive performance on all matches from the four FIFA World Cups 2002 - 2014. It turns out that by combining the random forest with the team ability parameters from the ranking methods as an additional covariate the predictive power can be improved substantially. Finally, the hybrid random forest is used (in advance of the tournament) to predict the FIFA World Cup 2018. To complete our analysis on the previous World Cup data, the corresponding 64 matches serve as an independent validation data set and we are able to confirm the compelling predictive potential of the hybrid random forest which clearly outperforms all other methods including the betting odds.
January 17, 2019 Koen Jochmans, Cambridge University
Title: Testing for within-group correlation in linear fixed-effect models
January 24, 2019 Johannes Lederer, University Bochum
Title: Tuning Parameter Calibration for Large and High-dimensional Data
Abstract: Large and high-dimensional data has become a major source of knowledge in Economics, Biology, Astronomy, and many other fields. However, lasso and other standard methods for such data depend on tuning parameters that are difficult to calibrate. In this talk, we introduce novel approaches to this calibration and demonstrate their features in theory, computations, and applications.
January 31, 2019 David Kraus, Masaryk University
Title: Regularized classification of functional data under incomplete observation
Abstract: Classification of functional data into two groups by linear classifiers is considered on the basis of one-dimensional projections of functions. Finding the best classifier is seen as an optimization problem that can be approximately solved by regularization methods, e.g., the conjugate gradient method with early stopping, the principal component method and the ridge method. We study the empirical version with finite training samples consisting of incomplete functions observed on different subsets of the domain and show that the optimal, possibly zero, misclassification probability can be achieved in the limit along a possibly non-convergent empirical regularization path. We propose a domain extension and selection procedure that finds the best domain beyond the common observation domain of all curves. In a simulation study we compare the different regularization methods and investigate the performance of domain selection. Our methodology is illustrated on a medical data set, where we observe a substantial improvement of classification accuracy due to domain extension. The talk is based on joint work with Marco Stefanucci.