Mobilising Knowledge Sources

One of the aims of the Likely Suspects Framework is to create a central data resource that pulls together the myriad of environmental and biological data sources to create new opportunities to examine patterns and provide vital new information.

 

Data Resources

Data is at the core of the Likely Suspects Framework. Data from salmon research, data from freshwater and marine biological research, and data from environmental research and monitoring. Datasets must be sourced, described and made accessible to the salmon science community. The LSF will build a central hub for research, and promote tools to access existing open data via common scientific languages. By constructing a directory of variable classes (see: Knowledge Variable Classes) this extensive data resource will help build a clearer picture of the salmons’ ecosystems, supporting a robust assessment of what is driving the fluctuations in salmon survival. By establishing novel ways to visualise and present outputs the LSF tools will be made widely accessible, assisting managers and decision-makers when developing future strategies

 
LSFDataResourcesFlow_NEW.jpg

DataFlow.jpg

Mobilising Data

Bringing together data to be accessed from a single resource presents challenges that are both complex and time consuming.

Following data discovery, there is a journey that data must take before they can be used to their full potential in research.

  • Access terms must be agreed. Data is costly to create, in time and money. Data owners have the right to control access to their products and ensure proper attribution is adhered to.

  • Cleaning and formatting the data is necessary to transform entries into a single, consistent format. The Likely Suspects Framework will promote the use of existing formats and controlled vocabularies as key to ensuring data remain interoperable and reusable.

  • Metadata practices must be robust. As datasets age their value and importance decline unless managed carefully. With the focus on using information from the past to inform on the future of salmon management, understanding data attrition in historical datasets is extremely important. Large and complex data frames rely on component datasets that have exhaustive descriptions



Data Standards

With the rise in data creation, data communities have become established that aim to tackle these issues and provide means for curating datasets into the long-term. The “Big Data Revolution” requires the use of standards to ease distribution in the present, and to lock in legible metadata for future generations.

FAIR Principles

Guidelines promoting the Findability, Accessibility, Interoperability and Reusability of data [1]. FAIR encourages the data community to seek out standards and incorporate them into data creation or curation practices. This promotes the use of rich metadata to label and describe datasets using existing knowledge exchange languages and controlled vocabularies throughout a domain.

MSA Section3_FAIR_data_principles_LoRes copy.jpg

Metadata Standards

Predefined descriptions of data used to align information across multiple datasets (field names, measurement units, locations, time)

  • Dublin Core

  • Ecological Metadata Language

  • TDWG (Darwin Core)

Controlled Vocabularies

Predefined vocabularies used to specify the same element across multiple datasets (species names, object names, tools, sampling methods)

  • World Register of Marine Species (WoRMS)

  • BODC NERC Vocabulary Server

References 1. Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18

2. Website: go-fair.org


Section3_WoRMS_logo_neg_blue.jpg
Section3_.jpg
Section3_.png
Section3_DublinCoreWide.jpg

Please email Graeme Diack with any further enquiries graeme@atlanticsalmontrust.org