A benchmark dataset for machine learning in ecotoxicology
October 2023
Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
The use of machine learning for predicting ecotoxicological outcomes is promising, but underutilized. The curation of data with informative features requires both expertise in machine learning and a strong biological and ecotoxicological background, which can be a barrier of entry for this kind of research. Additionally, model performances can only be compared across studies when the same dataset, cleaning, and splitting were used. Therefore, in this study, the researchers provide ADORE, an extensive and well-described dataset on acute aquatic toxicity in three relevant taxonomic groups (fish, crustaceans, and algae). The core dataset describes ecotoxicological experiments and is expanded with phylogenetic and species-specific data on the species, as well as chemical properties and molecular representations. The researchers propose concrete challenges to the community, including extrapolation across taxonomic groups, to learn more about the potential and limitations of machine learning in ecotoxicology.
A benchmark dataset for machine learning in ecotoxicology
Christoph Schür
Added on: 01-26-2024
[1] https://www.nature.com/articles/s41597-023-02612-2