close

PhD: Developing Scalable Multiple Imputation Routines for Longitudinal Data

Research / Academic
Utrecht

Are you interested in help solve one of the most ubiquitous and insidious problems in social, behavioural, and biomedical research? Then this is the PhD position for you.

Your job
The department of Methodology and Statistics has a job opening for a PhD candidate. In this position, you will develop novel statistical algorithms to treat missing data in large, longitudinal datasets. Doing so will require you to develop a broad base of statistical expertise and a diverse data analytic toolkit. You will work with Bayesian modelling, (longitudinal) structural equation modelling, high-dimensional prediction algorithms, and computational statistics/numerical methods. You will also have the opportunity to learn high-performance statistical computing techniques and develop highly marketable software development skills. Just for good measure, we’ll throw in four years of intensive on-the-job training in open-science workflows and open-source software development. Are you up for the challenge?

Today’s researchers are blessed with a wealth of high-quality, publicly available, longitudinal datasets. Yet, these data are nearly always incomplete, so fully realising the potential of these rich datasets requires principled missing data treatments. Multiple imputation (MI) is one of the most flexible and broadly applicable principled missing data treatments available. Treating missing data with MI can produce optimal results if the assumptions of the underlying predictive models are satisfied. However, the task of satisfying these assumptions in datasets with hundreds of variables linked by complex, multivariate relations—like the datasets implied above—remains a daunting challenge.

A recent PhD project supervised by Dr Lang (the daily supervisor also for this PhD project) has made promising headway on this problem by combining supervised principal components regression (SPCR) with MI. The MI-SPCA methods developed in that work have shown excellent performance in (high-dimensional) cross-sectional data, but the authors did not consider longitudinal or nested data structures. Clearly, many interesting datasets contain longitudinal measurements or otherwise nested structures, and these nested structures bring additional challenges for any missing data treatment (e.g., the need to preserve random effects, complex growth trajectories, and cross-level interactions).

In this project, you will work under the supervision of a team of three experts in missing data analysis, statistical computing, and longitudinal modelling to extend the abovementioned work to accommodate longitudinal data (e.g., by incorporating multilevel PCA methods). Although these new methods will suit diverse contexts, we will specifically target applications in developmental science. The methods you develop, therefore, will be designed to support valid estimation and inference in popular models of growth and temporal association (e.g., latent growth curves, [random intercept] cross-lagged panel models). You will use Monte Carlo simulation studies to compare the performance of the new methods to the current state-of-the-art in missing data treatment for longitudinal/nested data. As these simulation studies will necessarily entail a high computational demand, they will offer an excellent opportunity to learn and practice high-performance computing techniques.

In addition to the statistical computing that will be woven throughout your methodological research, you will also have the opportunity for hands-on software development experience. A key aspect of the proposed research plan is to distribute the methods we develop via free, open-source software (e.g., a standalone R package or contributions to existing software). Your supervisors are well-equipped to guide you through this process. Professor van Buuren is the author and maintainer of the mice package, and Dr Lang is one of the developers for the JASP package.

Finally, this project will provide the opportunity to explore best practices in open science. To ensure transparency and maximize chances for peer review, we will use public GitHub repositories for all software development projects. To ensure that our work is available to the widest possible audience, we will distribute any resulting software under a suitable open-source license (e.g., MPL, Apache, MIT). We will publish all papers with open access.

Requirements:

We are looking for an enthusiastic colleague (m/f/x) who meets the following requirements:

  • holds (or nearly holds) a Master’s degree in Methodology and Statistics or a related field;
  • has strong programming skills in R, some experience with statistical computing, and an interest in high-performance computing;
  • has an interest in open-source software development and will commit to implementing and distributing any new methods as open-source software;
  • has an interest in open science and will commit to following open-science principles;
  • has excellent verbal and written communication skills in English;
  • is a motivated, communicative, and collaborative team member who can work harmoniously in small teams;
  • can time-manage in the context of long-term projects to complete tasks independently while keeping team members apprised of progress.

Salary Benefits:

We offer:

  • a position for 1 year, with an extension to a total of four years upon successful assessment;
  • a working week of 38 hours and a gross monthly salary between €2,770 and €3,539 in the case of full-time employment (salary scale P under the Collective Labour Agreement for Dutch Universities (CAO NU));
  • 8% holiday pay and 8.3% year-end bonus;
  • a pension scheme, partially paid parental leave and flexible terms of employment based on the CAO NU.


In addition to the terms of employment laid down in the CAO NU, Utrecht University has a number of schemes and facilities of its own for employees. This includes schemes facilitating professional development, leave schemes and schemes for sports and cultural activities, as well as discounts on software and other IT products. We also offer access to additional employee benefits through our Terms of Employment Options Model. In this way, we encourage our employees to continue to invest in their growth. For more information, please visit Working at Utrecht University.

Work Hours:

36 - 40 hours per week

Address:

Padualaan 14