Metadata: explaining the data
Guest post by Rob Redpath, Open Data Services Co-operative. ODSC works closely with the 360Giving team, providing tools and support for publishers and users of 360Giving data.
If you want to find data that is published in the 360Giving Standard, you can go to the 360Giving Registry. To understand which files are relevant to you, you look at the information there about the files – how big each one is, what time period it covers, how many grants are in it, etc. These are just the tip of a whole metadata iceberg that we’ve been building, and I’m really excited to tell you about it now!
What is metadata?
It’s data about the data, such as the size of a file, how many grants it contains and when it was published.
Metadata is important to help understand more about the data and how it might be useful in a particular case.
There are two main types of metadata:
- Authoritative metadata, i.e. what a file declares about itself – such as its publisher or where the file can be found.
- Derived metadata, i.e. what we can find out from a file – such as the total value of the grants, or which fields from the 360Giving standard are used.
The right tool for the job
Our recent work on metadata started with a clearly articulated user need – people wanted to get hold of data that matched some criteria, such as ‘all grants from 2015’ or ‘grants that have beneficiary location data’. We found that people thought to use the Registry for this purpose, but weren’t able to see the information that they needed, and developers weren’t able to see enough detail in the JSON feed (the machine-readable data underneath the page) of the Registry either.
Some people were going to GrantNav and using the CSV/JSON download features, but they were becoming frustrated as that’s not what GrantNav is designed for.
Building metadata infrastructure
We therefore saw this as an opportunity to build some great metadata infrastructure, to serve both those who wanted to download some data for their own use, but also anyone wanting to build tools on top of 360Giving data.
Making better metadata available responds to two challenges.
Firstly, there is a chicken-and-egg issue. How do you know if a file is relevant to you? By knowing what’s in it. How do you know what’s in it? By downloading it and analysing it. A user could look at metadata very quickly to determine which files might be relevant to them, avoiding the problem of having to download a file before being able to determine its suitability.
Secondly, better metadata allows grantmakers to provide essential context information about their grants data to anyone using it. If you hand over a list of grants, you might have something that you’d always say in order to make sure that the person looking at the data really understands. That’s metadata, and by making it available in a machine-readable way, both humans and computers can better understand grantmakers’ data.
Building services to get this metadata out there!
In order to address these issues, we needed to build a service that would provide up-to-date information about what was in each file. Ideally, this would just be a copy of what grantmakers say about their own data, which we would just collate in one place for ease of reference. We then needed to create a standardised way for grantmakers to publish this information about their data.
We took what 360Giving store in their internal Salesforce system as a starting point, and trialled the resultant meta object / Meta tab with a couple of friendly data publishers, to try out the format. We’ve learned from that, and a proposal will be going to the 360Giving Stewardship Committee to add metadata to the 360Giving Standard.
We also built a service to collate this information, filling in any gaps with the data that 360Giving store in Salesforce. Alongside this authoritative metadata, we also provide derived metadata. The service necessarily downloads each file each night, and it carries out a number of checks. We listened to what people were saying that they wanted to filter files on, and ensured that the metadata that we created would be useful for them. It’s grown to become quite a comprehensive list! That’s now available as a JSON feed, which is available for developers to preview. Let us know if you’d like to try it out!.
Finally, we updated the Registry to use the new metadata feeds, and brought together both authoritative and derived metadata to display a range of useful information about files, so that users can decide which files are relevant to them without needing to go to GrantNav.
Using the metadata
Right now, most of the developments are behind-the-scenes, and we’re still finishing up the changes to the 360Giving Standard to add the new metadata aspects. But, you can already see the first uses of the metadata on the registry, as the information about the contents of the files wouldn’t have been possible without it.
If you’re developing applications that use 360Giving data and would find it helpful to have a service where you can select which files to download, we’ll happily give you a demo of what’s available. The public JSON feeds are still being developed, and will be announced shortly.
Once the 360Giving Standard has been extended to include metadata, we’ll be encouraging grantmakers to think about what they’d like to say about their grants, and help them incorporate that into their metadata.
We’re also planning to update the Registry with additional information about files, and make tools available to automatically download files that meet certain criteria.
Let us know what you’d find helpful!