Here we outline the requirements for the Hakai Metadata Entry Tool. This tool captures metadata for Hakai records that populate the Hakai Data Catalogue. You will need to sign in to create or edit a metadata record, which is only possible with a Gmail account. Although the Metadata Entry Tool outlines which fields are required and provides guidance text that can be helpful in populating fields, this document will elaborate on certain fields. This should ensure consistency across metadata records populated for the Hakai Data Catalogue. The form consists of 7 tabs: Start, Data Identification, Spatial, Contact, Resources, Platform and Submit, outlined below.
1. Start
This tab outlines some basic requirements for the Metadata Entry Tool. The form can be saved once a title is filled out (see Data Identification), and required fields are marked with an asterisk. A progress bar is included that details how much progress you’ve made (%) – a metadata record can only be submitted once it’s at 100%. The URL of the metadata record draft can be shared, allowing colleagues to collaborate on a metadata record, although not simultaneously.
The initial tab of a new metadata record created in the Metadata Entry Tool. The progress bar is depicted under the title. As no title has been provided yet (‘New Record’), this metadata record cannot be saved, hence why the save icon in the bottom right is greyed out.
2. Data Identification
In this section the data provider(s) can include metadata that is useful to identify and search for the metadata record. After populating the dataset title, you’ll notice that you can save the metadata record. When populating the dataset title (required), the asterisk will be replaced by a green check. The title and abstract are required in French – to help with this, a translate button was developed that relies on Google Translate. As best practice it’s useful to have a French speaker review the translation because it is not always perfect. As a general rule, if you intend to publish your data to CIOOS, it is strongly recommended to translate any fields in the Metadata Entry Tool if there is an option to.
2.1. Title / Language
The title of the metadata record should not include any special characters, and will appear as the title that is shown in the Hakai Data Catalogue. Recommendations for a descriptive title are:
Be frontloaded (with the most important information first)
Be short – aim for 60 characters (no special characters) including spaces
Do not include acronyms
Do not include the word “dataset”
Time series (ongoing) datasets should include “time series” at the end of the title, and no time period should be included.
Important: If you are creating or updating a metadata record to a new version of your dataset (i.e. for a time series dataset), there will be a specific field to indicate the version number. The version number does not need to be included in your title. For more information on when to update an existing metadata record with a new version number, or when to create a new metadata record, please see 07 – Dataset Versioning.
The primary language of the dataset is a required field and reflects the language used in the raw dataset. Only two options are available – French and English.
2.2 Project
The project field is not required, but will help search for program- or project-specific datasets in the Hakai Data Catalogue once published. Additional Hakai projects can be added by emailing data@hakai.org.
2.3 Abstract
The abstract is also a required field that has to be populated both in English and French (hence the translate button). It is recommended to follow the suggested abstract points. As a general rule, please try to use as little jargon as possible. Acronyms can be included in the abstract, but should be expanded on.
Please use a maximum of 500 words in your abstract. Any additional information on sampling protocols or methodologies should be part of the data package linked to under 'Resources'.
2.4 Essential Ocean Variables
In the next section, please check any Essential Ocean Variables (EOVs) that are contained within the dataset. This is a required field. If your dataset does not contain any EOVs, please select Other. It is possible to select multiple EOVs. Datasets that include EOVs are recommended to be published to the Canadian Integrated Ocean Observing System (CIOOS). The EOVs currently listed have a 1:1 relation with the EOVs listed under the Global Ocean Observing System (GOOS) framework, and to see definitions data providers are encouraged to click the icon next to the EOV. If you suspect your data might fall under an EOV, please investigate the EOV specification sheet (e.g. a fish fork length dataset can be classified under the Fish abundance and distribution EOV).
Important: EOVs are only applicable for Ocean variables - if e.g. Nutrient data was collected only in a lake or inland watershed (and not at a marine station), this would not qualify as an EOV.
Please select any Essential Ocean Variables (EOV) that is/are applicable to your dataset. You can select multiple EOVs. If no EOV is applicable, please select Other. For more information on these terms, the icon will direct you to its specification sheet.
2.5 Keywords/Status
Keywords are a required field. There are some keywords included in a drop-down list, and selecting these auto-populates its French counterpart. It’s also possible to populate keywords with free text. Make sure to click ‘Add’ after a single keyword rather than entering a comma or semi-colon separated list. Keywords can contain abbreviations, and there is no limit to the number of keywords you can add to a dataset, though somewhere between 3 - 7 keywords is reasonable. Keywords should include the place name of the closest community of major geographic location.
The status of the dataset is a required field, and there are three options: Ongoing, Historical Archive and Completed. Their definitions are listed. For time series data, or data that will be incrementally updated, select Ongoing. For data that is completed and published, and to which no additional observations will be added, select Completed. Historical Archive indicates that the data has been stored in an offline storage facility. At the Hakai Institute we mandate an Open Science Policy, and strongly advocate for collected data to be publicly accessible. If part of the data collected is stored securely in an offline storage facility (e.g. for sensitive or restricted data), it is recommended practice to also include this information under Dataset Limitations (see below).
Keywords can greatly enhance the findability of your dataset for users.
2.6 Dates
Although not a required field, and possibly also included in the published dataset, if you know the start and end date when data was collected, please populate these fields.
Rather than include the temporal extent in your dataset title, please include the start and end date here if applicable. If the status of the dataset is "Ongoing", no end date will need to be included.
You can also indicate the date when data was published. Data publication refers to making the data publicly accessible. This could for example be when the data was published to ERDDAP, OBIS or a generalist/institutional repository. In CKAN, this will show up under “Dataset Reference Date(s) – [date] (publication)”.
Dates as shown in the Hakai Data Catalogue. The Metadata Reference Dates depict when the metadata record was published to the Hakai Data Catalogue, and when it was last revised. The Dataset Reference Date(s) indicate when data collection started (Creation), and when the data was published.
2.7 Data Versioning
When updating an on-going time series data, it will be important to include the version of your data (package). Please keep this field numeric (i.e. 1.1 instead of v1.1. or Version 1.1). This version number will also be included in the recommended citation (see section 4). When your dataset has been revised or a new version released, please include the date when the data was revised and published. This is particularly important when making changes or additions to time series data. As such, if the status of your time series is Ongoing, and you're updating the metadata record to reflect the newest addition of time series data, it is strongly recommended to include the date when the data were revised. Please note that this field does not have to be populated when revising the metadata.
For additional information on dataset versioning, please see 07 - Dataset Versioning.
Please make sure that your version number matches the version of your data package.
2.8 Digital Object Identifier
If your dataset has a unique Digital Object Identifier (DOI), please include it here, excluding “https://doi.org/”. If your dataset does not have a DOI and you would like one, please finish filling out the entire metadata form, submit it, and then contact the Hakai Catalogue Team (catalogue-team@hakai.org) and we can help mint a DOI for your dataset.
2.9 Licensing and limitations
Please select the license under which the data (record) is published (required field). CC-BY-4 is the recommended license for records published to the Hakai Data Catalogue and for CIOOS.
If there are any limitations affecting the interpretation of the dataset, please include them here. If the overall dataset contains sensitive or restricted data, but this is not included in the published data, this information can be provided in this section. Although it is not a required field, it is recommended that if this field is populated, to also translate it to French.
3. Spatial
3.1 Spatial extent
The spatial coverage of the data is required to be included in this section. The spatial coverage can be included in three ways:
Draw a polygon or a rectangle in the interactive map,
Include the bounding box coordinates (North, South, East and West) in decimal degrees, or
Provide the polygon coordinates, making sure these start and end with the same point. Polygon coordinates should be separate by a white space, and follow the format: latitude,longitude (e.g. 48,-127)
The latitude coordinates should always be within the -90 to 90 range (inclusive), and for longitude coordinates -180 to 180 (inclusive).
3.2 Depth / Height
Aside from the spatial extent of the data, the vertical extent (range in meters) can also be provided, either as a Depth Positive or a Height Positive. If data was collected at a single depth or altitude, please ensure that both the minimum and maximum values are the same. Important: If you have select "Subsurface [...]" as an EOV in the previous section, please make sure you enter a maximum Depth Positive value that is greater than 0.
4. Contact
4.1 Contact Role(s)
This section is arguably the most important section when populating the metadata record, because it indicates who proper attribution and credit should be given to when citing the data record via automated generation of the recommended citation. It is important to appropriately credit all involved parties, be these organizations or individuals. Each data record requires at minimum a Data Owner and a Metadata Custodian. The Data Owner has the authority to license the data. The Data Owner and Metadata Custodian can be the same person or organization. Multiple roles can be selected for a contact, and a description of the roles is provided if you hover over the terms. Make sure to select the checkbox if you want the contact to be included in the citation. Drag and drop the order of contacts to have them appear in the correct order in the citation.
It is recommended to include the Hakai Institute as a Publisher for any data that is collected and published in the Hakai Data Catalogue.
Each contact can have multiple roles assigned to them, however only contacts that have a role assigned that includes an asterisk are included in the citation. A preview of the citation is also included on this page.
Note: if you'd like to add the Hakai Institute as both Publisher (in citation) and Distributor (not in citation) that you'll have to create two contacts for Hakai Institute - one specifically listing Hakai Institute as Publisher, to be included in the recommended citation, and the second one where the Hakai Institute is listed as Distributor, where the organization is not included in the recommended citation. Similarly, it's not possible to add multiple affiliations to a contact in the Metadata Entry Tool. If a contact has multiple affiliations, and you'd like to add both - you will have to duplicate the contact, adding the primary affiliation to the contact record that is included in the recommended citation. The duplicated contact details (which then reference secondary affiliations) should not be included in the citation, as this would duplicate the author (see below).
CORRECT
Note how I am only listed in the recommended citation once, even though I am listed twice in the contacts list (as a result of having multiple affiliations).
WRONG
Note how I am listed in the recommended citation twice. This is likely undesirable, so make sure you uncheck the Appear in Citation box.
Important: Should there be a discrepancy between the preview citation in the Metadata Entry Tool and the recommended citation in the Hakai Data Catalogue, please notify the catalogue team (catalogue-team@hakai.org).
4.2 Contact / Organization Details
Any additional information about the organization or individual can be included in this section. You can search for a Research Organization in the ROR field. The Hakai Institute and Tula Foundation can be found through this search tool. If the research project was funded by the Tula Foundation, make sure to include that information in the roles and organization information.
Similar to the ROR, if the ORCID of a researcher is known, please use that identifier to populate the personal data section. This will ensure that researchers are included in the recommended citation with the correct spelling. Scientists can register for an ORCID for free.
Please note that the email address provided should not include any trailing white space (i.e. "tim.vanderstap@hakai.org " will not be recognized). Individual names provided will appear in the citation following the format 'Johnson, B.'
5. Resources
In this tab, please provide a link to the data package in a zipped format. At minimum, one resource or item needs to be included as a valid URL and should be a link to data unless the metadata record is being published in advance of the data being available. When linking to your data package (e.g. in GitHub), please ensure that the URL provided prompts the computer system to download the compressed format of your data package (see 08 – Hosting on GitHub). Please do not provide a link to each individual data table in your data package.
The URL provided must be valid. The name entered for the Resource is how it shows up under the section “Data and Resources” in your metadata record, so please be descriptive. Both the URL and the name of the resource are required fields. A description of the resource can also be provided and translated, although these fields are not required.
6. Platform
In this section, any information related to the sampling platform or sampling instrument can be included. A platform here is described as anything used in data collection that has instrument(s) attached to it. It is possible to indicate that there was no platform used in data collection however, if a platform was used, you are required to include a description of the platform, it’s ID or code, and at least one instrument. Search the NERC Vocabulary Server for the appropriate platform. Use the platform ID or code field to include e.g. a vessel IMO number. When adding an instrument, the only required field to include is the Instrument ID. The Instrument ID is free text, and multiple instruments can be added as needed. A description of the instrument and instrument type can be translated.
Though not required, the description of the platform can be translated, and this is recommended if you intend to publish your data to both the Hakai Data Catalogue and the CIOOS Data Catalogue.
7. Submit
In this final section you can submit your record. You will only be able to submit your record if the progress bar is at 100%, meaning that all the required fields have been completed. If this is not the case, any missing fields are listed below, by section. Once submitted, a GitHub ticket will be created here, and an assignee from the Hakai Catalogue Team will the assigned to review it (randomly). You can track progress of review through the generated checklist in the GitHub ticket, and any questions the assignee will have for you will be communicated through the ticket. You will be contacted prior to publication of the record to the Hakai Data Catalogue.
Please note: even when the metadata record is published, adjustments can be made to the record by unpublishing the record temporarily and editing.
Once a record is created in the Metadata Entry Tool, you can view the record under ‘My Records’, and if necessary you can distinguish between published and submitted records. Under ‘Contacts’ you can add and save contacts or collaborators that you anticipate you might include in multiple records. Additionally, through this window you can select to edit, delete, return your data record to draft, clone, or download your metadata record.