Predicting Plankton Diversity with Machine Learning
Overview
Can operational ocean models (CMEMS) predict plankton diversity in the Iroise Marine Natural Park? This analysis combines 14 years of in situ monitoring with machine learning to identify environmental drivers of community structure.
Code and its description available on GitHub
These explorations were done in the context of the NECCTON.
Questions
Which oceanographic variables best explain plankton diversity? How do environmental drivers differ between phytoplankton and zooplankton? Can operational models predict local diversity? What is the relative importance of physical versus biogeochemical drivers?
Methods
Diversity indices were computed from zooplankton and phytoplankton abundance tables, grouped by station and date. In situ diversity metrics were merged with CMEMS data by matching station and sampling date. Analyses were conducted at multiple taxonomic levels (classes, species and genus) and depths (surface and bottom) to capture community patterns across different scales.
XGBoost models were trained to predict each diversity index from oceanographic predictors. Model performance was evaluated using standard machine learning metrics including RMSE, R², and mean absolute error. Variable importance scores were extracted to rank environmental drivers.