Data Science

My statistical toolkit has been built project by project. Below are the packages I actually use, in contexts where defaults break: mixed models with complex variance structures, multivariate abundance data, functional diversity indices, Bayesian hierarchical models. The resources section is kept for students and ECRs who find it useful.

Ecological data is messy: different species sampled at different rates, environments that vary in ways that matter statistically, thousands of observations that need to be compared across different conditions. Below are the tools I use to deal with that honestly. The resources section has guides and references for anyone getting started with R.

My Work

Analytical toolkit

My quantitative work spans community ecology (PERMANOVA, constrained and unconstrained ordination via vegan), mixed-effects and GLS models for heteroscedastic biological data (nlme, following Zuur protocols), generalised linear latent variable models for multivariate abundance data (gllvm), functional trait analysis including community-weighted means and trait-based dissimilarity (FD, betapart), and Bayesian modelling (brms). Analyses are organised into reproducible pipelines using targets.

Open work

I maintain an open archive of TidyTuesday data visualisation projects — weekly practice in exploratory analysis, ggplot2, and communicating data clearly. All code is public: github.com/tjw-benth/TidyTuesday.


TidyTuesday Projects

A showcase of my data analysis and visualization projects from TidyTuesday, a community data science initiative. You can explore all of my projects in my GitHub repository.

Featured Highlights: Two random projects are showcased below. Refresh the page to see different projects!

R packages

A small sample of ones I use regularly.


Data wrangling

  • cli - Command line interface tools
  • tidyverse - Collection of data science packages
  • worrms - World Register of Marine Species client

Data analysis

Mixed models & GLS

  • nlme — GLS and linear mixed models; heteroscedastic variance structures
  • glmmTMB — generalised mixed models with flexible variance and zero-inflation
  • lme4 — standard mixed-effects models

Multivariate ecology

  • vegan — community ecology; PERMANOVA, ordination, diversity indices
  • gllvm — generalised linear latent variable models for multivariate abundance data
  • betapart — beta diversity partitioning

Functional diversity

  • FD — functional diversity indices and community-weighted means

Post-hoc inference & model selection

  • emmeans — estimated marginal means and contrasts
  • MuMIn — multi-model inference and AICc selection

Model validation

  • DHARMa — residual diagnostics for hierarchical models using simulation

Bayesian

  • brms — Bayesian mixed models via Stan

Reproducible pipelines

  • targets — pipeline toolkit for reproducible R workflows

Plotting

  • gganimate - Animated ggplot2 graphics
  • ggblanket - Simplified ggplot2 wrapper
  • ggOceanMaps - Ocean and land maps
  • ggstream - Stream plots in ggplot2
  • ggtext - Rich text rendering for ggplot2
  • MetBrewer - Color palettes from the Met Museum
  • patchwork - Compose ggplot2 plots
  • sf - Simple features for spatial data
  • sp - Spatial data classes and methods
  • plotly - Interactive web graphics

Books and Literature

A selection that I always seem to refer back to.







Other Resources

A selection of resources that I have found useful.





Infographics and Cheat Sheets

Links to original source included.