Heriot-Watt Semantic Web Lab – Page 13 – Data integration, visualisation, and analytics

Two projects about describing courses

I’m currently involved in a couple of projects relating to representing course information as linked data / schema.org. 1. Course information in schema.org As you may know the idea of using LRMI / schema.org for describing courses has been mooted several times over the last two or three years, here and on the main schema.org … Continue reading Two projects about describing courses →

I’m currently involved in a couple of projects relating to representing course information as linked data / schema.org.

1. Course information in schema.org

As you may know the idea of using LRMI / schema.org for describing courses has been mooted several times over the last two or three years, here and on the main schema.org mail lists. Most recently, Wes Turner opened an issue on github which attracted some attention and some proposed solutions.

I lead a work package within the DCMI LRMI Task Group to try to take this forwards. To that end I and some colleagues in the Task Group have given some thought to what the scope and use cases to be addressed might be, mostly relating to course advertising and discovery. You can see our notes as a Google Doc, you should be able to add comments to this document and we would welcome your thoughts. In particular we would like to know whether there are any missing use cases or requirements. Other offers of help and ideas would also be welcome!

I plan to compare the derived requirements with the proposed solutions and with the data typically provided in web pages.

2. Institutional course data as linked data

Stefan Dietze commented on the schema.org course information work that it would be worth looking at similar existing vocabularies. That linked nicely with some other work that a colleague, Anna Grant, is undertaking, looking at how we might represent and use course data from our department as linked data (this is similar to some of the work I saw presented in the Linked Learning workshop in Florence). She is reviewing the relevant vocabularies that we can find (AIISO, TEACH, XCRI-CAP, CourseWare, MLO, CEDS). There is a working draft on which we would welcome comments.

W3C HCLS Dataset Descriptions Profile Published

After 3 years hard work, countless telephone conferences, issues and drafts, the W3C Health Cara and Life Sciences Community Group (HCLS) have finally published their community profile for describing datasets. The profile deals with different versions of a dataset with each version being published in multiple formats. Below is the announcement from the W3C. The Semantic […]

The Semantic Web Health Care and Life Sciences Interest Group has published a Group Note of Dataset Descriptions: HCLS Community Profile. Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. This document describes a consensus among participating stakeholders in the Health Care and the Life Sciences domain on the description of datasets using the Resource Description Framework (RDF). This specification meets key functional requirements, reuses existing vocabularies to the extent that it is possible, and addresses elements of data description, versioning, provenance, discovery, exchange, query, and retrieval. Learn more about the Data Activity.

Quick notes: Ian Pirie on assessment

Ian Pirie Asst Principal for Learning Developments at University of Edinburgh came out to Heriot-Watt yesterday to talk about some assessment and feedback initiatives at UoE. The background ideas motivating what they have been doing are not new, and Ian didn’t say that they were, they’re centred around the pedagogy of assessment & feedback as learning, … Continue reading Quick notes: Ian Pirie on assessment →

Ian Pirie Asst Principal for Learning Developments at University of Edinburgh came out to Heriot-Watt yesterday to talk about some assessment and feedback initiatives at UoE. The background ideas motivating what they have been doing are not new, and Ian didn’t say that they were, they’re centred around the pedagogy of assessment & feedback as learning, and the generally low student satisfaction relating to feedback shown though the USS. Ian did make a very compelling argument about the focus of assessment: he asked whether we thought the point of assessment was

to ensure standards are maintained [e.g. only the best will pass]
to show what students have learnt,
or
to help students learn.

The responses from the room were split 2:1 between answers 2 and 3, showing progress away from the exam-as-a-hurdle model of assessment. Ian’s excellent point was that if you design your assessment to help students learn, that will mean doing things like making sure your assessments address the right objectives, that the students understand these learning objectives and criteria, and that they get feedback which is useful to them, then you will also address points 2 and 1.

Ideas I found interesting from the initiatives at UoE, included

Having students describe learning objectives in their own words, to check they understand them (or at least have read them).
Giving students verbal feedback and having them write it up themselves (for the same reason). Don’t give students their mark until they have done this, that means they won’t avoid doing it but also once students know they have / have not done “well enough” their interest in the assessment wanes.
Peer marking with adaptive comparative judgement. Getting students to rank other students’ work leads to reliable marking (the course leader can assess which pieces of work sit on grade boundaries if that’s what you need)

In the context of that last one, Ian mention No More Marking which has links with the Mathematics Learning Support Centre at Loughborough University. I would like to know more about how many comparisons need to be made before a reliable rank ordering is arrived at, which will influence how practical the approach is given the number of students on a course and the length of the work being marked (you wouldn’t want all students to have to mark all submissions if each submission was many pages long). But given the advantages of peer marking on getting students to reflect on what were the objectives for a specific assessment I am seriously considering using the approach to mark a small piece of coursework from my design for online learning course. There’s the additional rationale there that it illustrates the use of technology to manage assessment and facilitate a pedagogic approach, showing that computer aided assessment goes beyond multiple choice objective tests, which is part of the syllabus for that course.

New projects for me at Heriot-Watt

I’ve been at Heriot-Watt University for many years now but haven’t really had much to do with the use of technology to enhance teaching and learning here. A couple of new projects might change that. The Learning and Teaching Strategy for the School of Mathematical and Computer Sciences mentions using technology to create a more student centred … Continue reading New projects for me at Heriot-Watt →

The Learning and Teaching Strategy for the School of Mathematical and Computer Sciences mentions using technology to create a more student centred approach to learning, and also reshaping the soft learning environment to meet challenges raised by things like delivering courses across campuses in Edinburgh, Dubai, Malaysia, and with learning partners around the world. So it references ideas like the use of khan-academy style videos where appropriate, effective use of formative assessment and feedback and use of the virtual learning environment to facilitate student interaction and collaboration across those different campuses.

To put this strategy into action the School has set up a working group, which I am convening. The approach will not to be prescriptive and dictatorial, that wouldn’t work; we want to focus on identifying, nurturing and disseminating within the School the existing practice that aligns with those strategic aims. We also want to bring in ideas from outwith the School that can be realised in our contexts, they will have to be practical ideas with demonstrable benefits (I’ll still do explorative researchy things, but through other work). We started work a couple of weeks ago, with two initial tasks: 1, a survey to identify what people are already doing that might be worth sharing and to identify what ideas they would like help progressing; and, 2, an internal show-and-tell event to discuss such ideas. I rather hope that the event isn’t a one-off, that it leads to other similar events, and also that the practice we find through it and the survey can be made open so that we can interact with all the other people doing similar at their own institutions.

Coincidently, I have also been asked to look at automated assessment, especially in exam scenarios in Computer Science. We have run electronic exams in the past, and many staff appreciated the automatic marking, but the system that we used until now is no longer available. So I shall be working with colleagues to try to find a replacement. I haven’t worked much with online assessment before, but I think there are three related but separate strands that will need following: 1, the software system, its functionality and usability; 2, policy issues such as security for high stakes assessment; 3, pedagogic issues. Clearly they are interdependent, for example if your pedagogic considerations lead you to decide that students should have access to the web during exams, then the security issues you need to consider change. My feeling is that only an off-the-shelf system will be sustainable for us, so I’m looking at commercial and open source systems that have already been developed. However, Computer Science obviously has a very particular relationship with the use of computers in teaching and assessment that may not be exploited by general purpose computer aided assessment.

LRMI, Learning Resource Metadata on the Web. From the #LiLE2015 workshop

Stefan Dietze invited me to give the keynote presentation at the pre-WWW2015 workshop Linked Learning 2015 in Florence. I’ve already posted a summary of a few of the other presentations I saw, this is a long account (from my speaker’s notes) of what I said. If you prefer you can read the abstract of my talk from the WWW2015 Companion Proceedings or look through my slides & notes on Google. This is a summary of past work at Cetis that lead to our invovement with LRMI, why we got involved, and the current status of LRMI. There’s little here that I haven’t written about before, but I think this is the first time I’ve pulled it all together in this way.

Lorna M. Campbell was co-author for the presentation; the approach I take draws heavily on her presentation from the Cetis conference session on LRMI. Most of the work that we have done on LRMI has been through our affiliation with Cetis. I’ll describe LRMI and what it has achieved presently. In general for this I want to keep things fairly basic. I don’t want to assume a great deal of knowledge about the educational standards or the specifications on which LRMI is based, not so much because I think you don’t know anything, but because, firstly I want to show what LRMI drew on, but also because whenever I talk to people about LRMI it becomes clear that different people have different starting assumptions. I want to try to make sure that we kind of align our assumptions.

LRMI prehistory and precursors

I want to start by reviewing some of what we (Lorna and I and Cetis) did before LRMI and why we got involved in it.

That means talking about metadata. Mostly metadata for the purpose of resource discovery, in order to support the reuse of educational content; we want to support the reuse of educational content in order to justify greater effort going in to the creation of better content and allowing teachers to focus on designing better educational activities. We were never interested in metadata just for its own sake, but, we felt that however good an educational resource is, if you can’t find it you can’t use it.

And we can start with the LOM, the first international standard for educational technology, designed in the late 1990’s, completed in 2002 (at least the information model was–the XML binding came a couple of years later; other serializations such as RDF were never successfully completed)

We had nothing to do with designing the LOM.

But we did promote its use, for example:

I worked on a project called FAILTE, a resource discovery service for Engineering learning resources, which involved people with various expertise (librarians, engineering educators, learning technologists) creating what was essentially a catalogue of of LOM records.
I was also involved in a wider initiative to facilitate similar services across all of UK HE, by creating an application profile for use by joint projects of two organisations, RDN & LTSN (RLLOMAP)
Meanwhile Lorna was leading work to create an application profile of the LOM with UK-wide applicability (UK-LOM core)

These were fairly typical examples of LOM implementation work at that time. Also, none of them still exists.

All these involve application profiles, that is tailoring the LOM by recommending a subset of elements to use and specifying what controlled vocabularies to use to provide values for them (see metadata principles and practicalities, section IIIA). And there’s a dilemma there, or at least you have to make a compromise, between creating descriptions which make sense in a local setting and meet local needs, and getting interoperability between different implementations of the LOM.

In fact some of the initial LRMI research was a survey of how the LOM is used, looking at LOM records being exposed through OAI-PMH found that most LOM records provided very little beyond what could be provided with simple Dublin Core elements, which agreed with previous work comparing different application profiles (e.g. Goodby, 2004). (See also a similar study by Ochoa et al (2011) conducted at about the same time which focussed repositories that had been designed to use the LOM.)

Anyway, one way or another we got a lot of experience in the LOM and Lorna and I were also part of the team that created the IMS learning resource meta-data specification, especially the IMS Meta-data Best Practice Guide for IEEE 1484.12.1-2002 Standard for Learning Object Metadata, 2006 which is basically a set of guidelines for how to use the LOM.

But I wasn’t talking about the LOM in Florence. Why not? Well, IEEE LOM and IMS Metadata have their uses, and if they work for you that’s great. But I’ve also mentioned some of the problems that we faced when we tried to implement the LOM in more or less open systems: lots of effort to create each record, compromise between interoperability and addressing specific requirements. The structure of the LOM as a single XML tree-like metadata record comprising all the information about a resource does little to help you get around these problems. It also means that the standard as a whole is monolithic: the designers of the LOM had to solve the problems of how to describe IPR, technical, lifecycle issues, and others (then consider that many different resource types can be used as learning resources, and what works a technical description of a text document might not work for an image or video). Solving how to describe educational properties is quite hard enough without throwing solutions to all of these others into the same standard.

So, having learnt a lot from the LOM, we moved on hoping to find approaches to learning resource description that disaggregated the problem (at both design and implementation stages) into smaller less intimidation tasks.

I want to mention some work on Semantic technologies and what was then beginning to be called linked data that Cetis helped commission and were involved in through a working group aournd 2008 – 2009. The Semantic Technologies in Learning and Teaching Jisc mini-project / Cetis working group run by Thanassis Tiropanis et al out of the University of Southampton. The SemTech project aimed to raise the profile of semantic technologies in HE, to highlight what problems they were good at solving. The project included a survey of then-existing semantic tools & services used for education to discover what they were being used for. (they found 36, using a fairly loose definition of “semantic”.

The “five year plan” outlined by that project is worth reflecting on. Basically it suggested that exposing more data which can be used by applications, thus encouraging more data to be released (a sort of optimistic virtuous cycle), and the development of shared ontologies which yield benefits when there you have large amounts of data (Notably, it didn’t suggest spending years locked in a room coming up with the one ontology that would encompass everything before releasing any data).

The development of semantic applications for teaching and learning for HE/FE over the next 5 years could be supported in a number of steps:

Encouraging the exposure of HE/FE repositories, VLEs, databases and existing Web 2.0 lightweight knowledge models in linked data formats. Enabling the development of learning and teaching applications that make use of linked data across HE/FE institutions; there is significant activity on linked open data to be considered

Enabling the deployment of semantic-based searching and matching services to enhance learning. Such applications could support group formation and learning resource recommendation based on linked data. The development of ontologies to which linked data will be matched is anticipated. The specification of patterns of semantic tools and services using linked data could be fostered

Collaborative ontology building and reasoning for pedagogical ends will be more valuable if deployed over a large volume of education related linked data where the value of searching and matching is sufficiently demonstrated. Pedagogy-aware applications making use reasoning to establish learning context and to support argumentation and critical thinking over a large linked data field could be encouraged at this stage.

Semantic technologies in learning & teaching, Final Report, Executive Summary

Our first efforts outside of IEEE LOM were in the Dublin Core Education Application Profile Task Group , between about 2006-2011, attempting to work on a shared ontology. Meanwhile others (notably Mikael Nilsson, KTH Royal Inst Technology, Stockholm) worked to get LOM data in RDF. This work kind of fizzled out, but we did get an idea of a domain model for learning resources, which rather usefully separated the educationally relevant properties from all the others. The cloud in the middle represents any resource-type specific domain model (say one for describing videos or one for describing textual resources) to which educationally relevant properties can be added. So this diagram represents what I was saying earlier about wanting to disaggregate the problem space so that we can focus on educational matters while other domain experts focus on their specialisms.

I want to mention in passing that around this time (2008/9) work started at ISO/IEC on semantic representation of metadata for learning resources. This was kicked off in response to the IEEE LOM being submitted for ratification as an ISO standard… and it is still ongoing. We’re not involved. Cetis has done no more than comment once or twice on this work.

In fact we did very little metadata work for a while. I thought I was done with it.

At this time there was there was a an idea in educational technology circles that was encapsulated in the term #eduPunk, the idea was that lightweight personal technology could be used to support teaching and learning, a sort DIY approach to learning technology, without the constraints of large institutional, enterprise level systems–WordPress instead of the VLE, folksonomies instead of taxonomies.

In comparison to eduPunk, we were #eduProg. I’ve nothing against the virtuoso wizardry of ProgRock or a technically excellent OWL ontology, and I am not saying there is anything wrong in either. The point I am trying to make is that the interest and attention, the engagement from the Ed Tech community was not in EduProg.

The attention and engagement was in Open Educational Resources, and we supported a UK HE 3 year, £15Million programme around the release of HE resources under creative commons licences [UKOER]. Cetis provided strategic technical advice and support to the funder and to the 66 projects that released over 10,000 resources. The support include guidelines on technology approaches to the management, description and dissemination of OERs; the guidelines we gave were for lightweight dissemination technologies, minimal metadata, and putting resources where they could be found. We reflected at length on the technology approaches taken by this programme in our book Into the wild – Technology for open educational resources. We recognise the shortcomings in this approach, it’s not perfect, and some people were quite critical of it. If we had been able to point to any discovery services that were based on the LOM or any more directed approach that were unarguably successful we would have recommended it, but it seemed that Google and the open web was at least as successful as any other approach and required less effort on the part of the projects. Partly through UKOER we did see 10,000 resources and more importantly a change in culture so that using social sharing site for education became unremarkable, an I would rather have that than a few 100 perfect metadata descriptions in a repository.

As far as resource description and resource discovery is concerned I think the most important advice we gave was this:

make sure the resource has a good textual description and is findable on Google.

The Learning Resource Metadata Initiative

LRMI launched in 2011. what about it got us back into educational metadata? Let’s start from first principles, and look at the motivation behind LRMI, which is to help people find resources to meet their specific needs. I’ll try to illustrate this.

Meet Pam, a school teacher. Let’s say she wants to teach a lesson about the Declaration of Arbroath.

[See credits, below]

What are her specific needs? Well, they relate to her students: their age, their abilities; to her teaching scenario: is she looking for something to use as part of a half hour lesson on a wider topic, or something that will provide a plan of work over a few lessons? introduction or revision? And there is the wider context, she’s unlikely to be teaching about the declaration of Arbroath for its own sake, more likely it will relate to some aspect of a wider curriculum, perhaps history but perhaps also something around civic engagement in Scotland, or relations between Scotland and England, or precursors to the US declaration of independence, but she will be doing so because she is following some shared curriculum or exam syllabus.

She searches Google, finds lots of resources, many of them are no more than the text of the resource.

There are also tea towels and posters.

Those that go further do not necessarily do so in a way that is suitable for her pupils. There’s a Wikipedia article but that’s not really written with school children in mind. Google doesn’t really support narrowing down Pam’s search to match her requirements such as the age and educational level of students, the time required to use in a lesson, the relevance to requirements of national curriculum or exam syllabus, so Pam is forced to look at a series of separate search services based (often) on siloed metadata [examples 1, 2, 3]. It’s worth noting that the examples show categorisation by factors such as Key Stage (i.e. educational level in the English National Curriculum), educational subject, intended educational use (e.g. revision) and others, giving hints as to what Pam might use to filter her search. Google (historically) hasn’t been especially good at this sort of filtering, partly because it cannot always work out the relevance of the text in a document.

What happened to make us think that it was worth addressing this problem was schema.org:

a joint effort, in the spirit of sitemaps.org, to improve the web by creating a structured data markup schema supported by major search engines.

schema.org FAQ

Schema.org provides:

An agreed hierarchy of entity types (see right).
An agreed vocabulary for naming the characteristics of resources and the relationships between them.
Which can be added to HTML (as microdata, RDFa or JSON-LD) to help computers understand what the strings of text mean.

Adding schema.org markup (as microdata) to HTML, turns the code behind a web page from something like:

<h1>Learning Resource Metadata Initiative: using schema.org to describe open educational resources</h1>
<p>by Phil Barker, Cetis, School of Mathematical and Computer Sciences, Heriot-Watt University <br />
Lorna M Campbell, Cetis, Institute for Educational Cybernetics, University of Bolton. April 2014</p>

i.e. just strings, not much to hint as to which string is the authors name, which string is the title of the paper, which string is the author’s affiliation. to something like

<div itemscope itemtype="http://schema.org/ScholarlyArticle">
<h1 itemprop="name">Learning Resource Metadata Initiative: using schema.org to describe open educational resources</h1>
<p itemprop="author" itemscope itemtype="http://schema.org/Person">
 <span itemprop="name">Phil Barker</span>, 
 <span itemprop="affiliation">Cetis, School of Mathematical and Computer Sciences, Heriot-Watt University</span></p>
<p itemprop="author" itemscope itemtype="http://schema.org/Person">
 <span itemprop="name">Lorna M Campbell</span>, 
 <span itemprop="affiliation">Cetis, Institute for Educational Cybernetics, University of Bolton</span></p>
</div>

where the main entities and their relationships are marked and text that related to properties of those items is identified: a Scholarly Article is related to two Persons who are the authors; some of the text is the name of the Scholarly Article (i.e. its title), the names of the Persons and their affiliations. Represented graphically, we could show this information as:

An entity – relation graph identifying the types of entities, their relationships to each other and to the strings that describe significant properties.

At this point the LRMI was initiated, a 3 year project funded by the Bill and Melinda Gates foundation (and later wth some additional funding from the Hewlett Foundation), managed jointly by one organisation committed to open education (Creative Commons) and another (AEP) from the commercial publishing world, with input from education,publishers and metadata experts.

I was on the technical working group. We issued a call for participation; gathered use cases; and did the usual meeting and discussing to hammer out how to meet those use cases. We worked more or less in the open,–there was an invitation only face to face meeting near the beginning (limited funding so couldn’t invite everyone) after that all the work was on open email discussion lists and conference calls. Basing the work on schema.org allowed us to leave all the generic and resource-format specific stuff for other people to handle, and we could focus just on the educational properties that we needed.
The slide on the left shows what came out. The first two properties are major relationships to other entities, and alignment to some educational framework and the intended audience, the others are mostly simple characteristics. All are defined in the LRMI specification. In a previous blog post I have attempted further explanation of the Alignment Object. Most of these were added to Schema.org in 2013, the link to licence information was added later.

Current state of LRMI and future plans.

LRMI has been implemented by a number of organisations, some with project funding to encourage uptake, others more organically. One of the nice things about piggy-backing on schema.org is that people who have never heard of LRMI are using it.

Not every organisation on this list exposes LRMI metadata in its webpages, some harvest it or create it and use it internally. The Learning Registry is especially interesting as it is a data store of information about learning resources in many different schema, which uses LRMI as JSON-LD for (many of its) descriptive records. We have reported in some depth on the various ways in which LRMI has been implemented by those projects who are funded through the initiative.

We can create a Google custom search engine that looks for the alignment object–this in itself is a good indicator that someone has considered the resource to be useful for learning; and we can add filters to find learning resources useful for specific contexts, in this case different educational levels. This helps Pam narrow down her search–at least in a proof of concept way, as they stand these are not intended to be useful services.
I would like to note the following points from these implementations:

they exist. That’s a good first step.
not every implementation exposes LRMI metadata, some use it internally.
schema.org is a lightweight, loose ontology, implementation is looser.
The people implementing it tend to make mistakes. Expect to find strings where there should be a link and also find links and properties where they shouldn’t be. (see also Martin Hepp’s presentation from Ontologies to Web Ontologies and the Bulletin of the Association for Information Science and Technology Vol. 41 No 4, “the charm of weak semantics”)
there is no agreement on value spaces, either terms or meanings (e.g. educational level, 1st Grade, Primary 1).

The Gates funding for LRMI is now complete, and as an organization LRMI is now a task group of the Dublin Core Metadata Initiative. That provides us with with the mechanisms and governance required to maintain, promote, and if necessary extend the specification. It does not mean that LRMI terms are DC terms, they’re not, they’re in a different namespace. DCMI is more than a set of RDF terms, it’s a community of experts working together, and that’s what LRMI is part of. The LRMI specification is now a community specification of DCMI, conforming to the the requirements of DCMI, such as having well-maintained definitions in RDF, which align with the schema.org definitions but are independently extensible.

The planned work of task group is shown on the group wiki, and includes:

Extending LRMI: Events? Courses?
- contributing via new schema.org extension mechanism?
Recommended value vocabularies
Linked data representation of educational frameworks (alignment)

(There’s also a background interest in the use of LRMI beyond the original schema.org scenario, for example as stand-alone JSON-LD or as EPUB metadata for eTextBooks)

It’s customary to allow time for the audience to ask difficult questions of the presenter. I tried to forestall that by asking the audience’s opinion on these questions:

Does this help with the endeavour to expose lightweight linked data?
- (can you get the data out of web pages?)
How do we encourage linked data representation of educational frameworks?
How much goes into schema.org (or similar) or should we just reuse existing ontologies?
Can you cope with the the quality of data that can be provided at web-scale?

Reflections on the presentation

As far as I could judge from the questions that I couldn’t answer well, the weak points in the presentation, or in LRMI may be, seem to be around gauging the level of uptake: how many pages are there out there with LRMI data on them? I don’t know. The schema.org pages for each entity show usage , for example the Alignment Object is on between 10 and 100 domains, but I do not know the size of those domains. That also misses those services that use LRMI and do not expose it in their webpages but would expose it as linked data in some other format. I suspect uptake is less than I would like, and I would like to see more.

As presenter I was happy that even after I had talked about all that for about 45 minutes, there were people who wanted to ask me questions (the forestalling tactic didn’t work), and even after that there were people who wanted to talk to me instead of going for coffee. That seems to be a good indicator that there was interest from the workshop’s audience.

Image credits: Photo of Pam Robertson, teacher, by Vgrigas (Own work) [CC-BY-SA-3.0 ], via Wikimedia Commons; reproduction of Tyninghame (1320 A.D) copy of the Declaration of Arbroath, 1320, via Wikimedia Commons. Logos (Heriot-Watt, Cetis, LRMI, Semtech etc.) are property of the respective organisation. Unless noted otherwise on slide image, other images created by the authors and licensed as CC-BY.

LRMI / schema.org validation

We are currently preparing some examples of LRMI metadata. While these are intended to be informative only, we know that they will affect implementations more than any normative text we could put into a spec–I mean what developer reads the spec when you can just copy an example? So it’s important that the examples are valid, … Continue reading LRMI / schema.org validation →

We are currently preparing some examples of LRMI metadata. While these are intended to be informative only, we know that they will affect implementations more than any normative text we could put into a spec–I mean what developer reads the spec when you can just copy an example? So it’s important that the examples are valid, and that set me to pulling together a list of tools & services useful for validating LRMI, and by extension schema.org.

Common things to test for:

simple syntax errors produced by typos, not closing tags and so on.
that the data extracted is valid schema.org / LRMI
using properties that don’t belong to the stated resource type, e.g. educationalRole should be a property of EducationalAudience not of CreativeWork.
loose or strict interpretation of expected value types, e.g. the author property should have a Person or Organization as its value, dates and times should be in iso 8601 format?
is the data provided for properties from the value space they should be? i.e. does the data provider use the controlled vocabulary you want?
check that values are provided for properties you especially require

[Hint, if it is the last two that you are interested in then you’re out of luck for now, but do skip to the “want more” section at the end.]

See also Structured Data Markup Visualization, Validation and Testing Tools by Jarno van Driel and Aaron Bradley.

Schema.org testing tools

Google structured data testing tool

https://developers.google.com/structured-data/testing-tool/

If Google is your target this is as close to definitive as you can get. You can validate code on a server via a URL or by copying and pasting it into a text window, in return you get a formatted view of the data Google would extract.

Validates: HTML + microdata, HTML + RDFa, JSON-LD

Downsides: it used to be possible to pass the URL of the code to be validated as a query parameter appended to the testing tool URL and thus create a “validate this page” link, but that no longer seems to be the case.

Also, the testing tool reproduces Google’s loose interpretation of the spec, and will try to make the best sense it can of data that isn’t strictly compliant. So where the author of a creative work is supposed to be a schema.org/Person if you supply text, the validator will silently interpret that text as the name of a Person entity. Also dates not in ISO 8601 format get corrected (October 4 2012 becomes 2012-10-4 That’s great if your target is as forgiving as Google, but otherwise might cause problems.

But the biggest problem seems to be that pretty much any syntactically valid JSON-LD will validate.

Yandex structured data validator

https://webmaster.yandex.com/microtest.xml

Similar to the Google testing tool, but with slightly larger scope (validates OpenGraph and microformats as well as schema). Not quite as forgiving as Google, a date in format October 4 2012 is flagged as an error, and while text is accepted as a value for author it is not explicitly mapped to the author’s name.

Validates: HTML + microdata, HTML + RDFa, JSON-LD

Downsides: because the tool is designed to validate raw RDF / JSON-LD etc, just because something validates does not mean that it is valid schema.org mark up. For example, this JSON-LD validates:

{ "@context": [
    { 
         "@vocab": "http://Schema.org/"
    }
 ],
     "@type": "CreativeWork" ,
     "nonsense" : "Validates"
 }

Unlike the Google testing tool you do get an appropriate error message if you correct the @vocab URI to have a lower-case S, making this the best JSON-LD validator I found.

Bing markup validator

http://www.bing.com/toolbox/markup-validator

“Verify the markup that you have added to your pages with Markup Validator. Get an on-demand report that shows the markup we’ve discovered, including HTML Microdata, Microformats, RDFa, Schema.org, and OpenGraph. To get started simply sign in or sign up for Bing Webmaster Tools.”

Downsides: requires registration and signing-in so I didn’t try it.

schema.org highlighter

http://www.curriki.org/xwiki/bin/view/Coll_jmarks/LRMIViewerBookmarkTool

A useful feature of the validators listed above is that they produce something that is human readable. If you would like this in the context of the webpage, Paul Libbrecht has made a highlighter, a little bookmarklet that transforms the schema.org markup into visible paragraphs one can visually proof.

Translators and other parsers

Not validators as such, but the following will attempt to read microdata, RDFa or JSON-LD and so will complain if there are errors. Additionally they may provide human readable translations that make it easier to spot errors.

RDF Translator

http://rdf-translator.appspot.com/

“RDF Translator is a multi-format conversion tool for structured markup. It provides translations between data formats ranging from RDF/XML to RDFa or Microdata. The service allows for conversions triggered either by URI or by direct text input. Furthermore it comes with a straightforward REST API for developers.” …and of course is your data isn’t valid it won’t translate.

Validates pretty much any RDF / microdata format you care to name, either by entering text in a field or by reference via a URI.

Downsides: again purely syntactic checking, doesn’t check whether the code is valid schema.org markup.

Structured data linter

http://linter.structured-data.org/

Produces a nicely formatted, human readable representation of structured data.

Validates: HTML + microdata, HTML + RDFa either by URL, file upload or direct input.

Downsides: another that is purely syntactic.

JSON-LD Playground

http://json-ld.org/playground/

A really useful tool for automatically simplifying or complexifying JSON-LD, but again only checks for syntactic validity.

Nav-North LR data

https://github.com/navnorth/LR-Data

“A Tool to help import the content of the Learning Registry into a data store of your choice” I haven’t tried this but it does attempt to parse JSON-LD so you would expect it to complain if the code doesn’t parse.

Want more?

The common shortcoming (for this use case anyway, all the tools are good at what they set out to do) seems to be validating whether the data extracted is actually valid schema.org or LRMI. If you want to validate against some application profile, say insisting that the licence information must be provided, or that values for learningResourceType come from some specified controlled vocabulary then you are in territory that none of the above tools even tries to cover. This is, however, in the scope of the W3C RDF Data Shapes Working Group “Mission: produce a W3C Recommendation for describing structural constraints and validate RDF instance data against those.”

A colleagues at Heriot-Watt has had students working (with input from Eric Pud’Hommeaux) on Validata “an intuitive, standalone web-based tool to help building valid RDF documents by validating against preset schemas written in the Shape Expressions (ShEx) language.” It is currently set up to work to validate linked data against some pre-set application profiles used in the pharmaceuticals industry. With all the necessary caveats about it being student work, no longer supported, using an approach that is preliminary to the W3C working group, this illustrates how instance validation against a description of an application profile would work.

Linked learning highlights from Florence #LILE2015

I was lucky enough to go to Florence for some of the pre WWW2015 conference workshops because Stefan Dietze invited me to talk at the Linked Learning workshop “Learning & Education with the Web of Data“. Rather than summarize everything I saw I would like to give brief pointers to three presentations from that workshop and the “Web-base … Continue reading Linked learning highlights from Florence #LILE2015 →

I was lucky enough to go to Florence for some of the pre WWW2015 conference workshops because Stefan Dietze invited me to talk at the Linked Learning workshop “Learning & Education with the Web of Data“. Rather than summarize everything I saw I would like to give brief pointers to three presentations from that workshop and the “Web-base Education Technologies” (WebET 2015) workshop that followed it that were personal highlights. Many thanks to Stefan for organizing the conference (and also to the Spanish company Gnoss for sponsoring it).

Semantic TinCan

I’ve followed the work on Tin Can / xAPI / Experience API since its inception. Lorna and I put a section about it into our Cetis Briefing on Activity Data and Paradata, so I was especially interested in Tom De Nies‘s presentation on TinCan2PROV: Exposing Interoperable Provenance of Learning Processes Through Experience API Logs. Tin Can statements are basically elaborations of “I did this,” providing more information about the who, how and what referred to by those three words. Tom has a background in provenance metadata and saw the parallel between those statements and the recording of actions by agents more generally, and specifically with the model behind the W3C PROV ontology for recording information about entities, activities, and people involved in producing a piece of data or thing. Showing that TinCan can be mapped to W3C PROV is of more than academic interest: the TinCan spec provides only one binding, JSON but the first step of Tom’s work was to upgrade that to JSON-LD and then through the mapping to PROV allow any of the serializations for PROV to be used (RDF/XML, N3, JSON-LD), and to bring the Tin Can statements into a format that allows them to be used by semantic technologies. Tom is hopeful that the mapping is lossless, you can try it yourself at tincan2prov.org.

Linking Courses

I also have an increasing interest in the semantic representation of courses, there’s some interest in adding Courses to schema.org, but also within my own department some of us would like to explore advantages of moving away from course descriptors as PDF documents to something that could be a little more connected with each other and the outside world. Fouad Zablith’s presentation on Interconnecting and Enriching Higher Education Programs using Linked Data was like seeing the end point of that second line of interest. The data model Fouad uses combines a model of a course with information about the concepts taught and the learning materials used to teach them. Course information is recorded using Semantic MediaWiki to produce both human readable and linked data representations of the courses across a program of study. A bookmarklet allows information about resources that are useful for these courses to be added to the graph–but importantly attached via the concept that is studied, and so available to students of any course that teaches that concept. Finally, on the topic of several courses teaching the same concepts, sometimes such repetition is deliberate, but sometimes it is unwanted.

Showing which concepts (middle) from one course (left) occur in other courses (right). Image from Fouad Zablith's presentation (see link in text) — Showing which concepts (middle) from one course (left) occur in other courses (right). Image from Fouad Zablith’s presentation (see link in text)

Fouad showed how analysis of the concept – course part of the graph could be useful it surfacing cases of where there were perhaps too many concepts in a course that had been previously covered elsewhere (see image, above)

Linking Resources

One view of a course (that makes especial sense to anyone who thinks about etymology) is that it is a learning pathway, and one view of a pathway is as an Directed Acyclic Graph, i.e. an ordered route through a series of resources. In the WebET workshop, Andrea Zielinski presented A Case Study on the Use of Semantic Web Technologies for Learner Guidance which modelled such a learning pathway as a directed graph and represented this graph using OWL 2 DL. The conclusion to her paper says “The approach is applicable in the Semantic Web context, where distributed resources often are already annotated according to metadata standards and various open-source domain and user ontologies exist. Using the reasoning framework, they can be sequenced in a meaningful and user-adaptive way.” But at this stage the focus is on showing that the expressivity of OWL 2 DL is enough to represent a learning pathway and testing the efficiency of querying such graphs.

SICSA Databases for the Environmental and Social Sciences

Today I attended the SICSA Databases for the Environmental and Social Sciences event hosted by Andy Cobley from the University of Dundee. I gave the below talk on the challenges of linking data. Many areas of scientific discovery rely on combining data from multiples data sources. However there are many challenges in linking data. This […]

Today I attended the SICSA Databases for the Environmental and Social Sciences event hosted by Andy Cobley from the University of Dundee. I gave the below talk on the challenges of linking data.

Many areas of scientific discovery rely on combining data from multiples data sources. However there are many challenges in linking data. This presentation highlights these challenges in the context of using Linked Data for environmental and social science databases.

New Open PHACTS Explorer Released

The Open PHACTS Explorer has a shiny new interface that provides a web based application for searching and exploring the data in the Open PHACTS Discovery Platform. The video below (10 minutes) will demonstrate how to use the new Explorer and give you an overview of its features.

SICSA DEMOFest Video

Brief one minute video recorded at SICSA DEMOFest2014 of me describing my research and the Open PHACTS Discovery Platform.