Checking schema.org data with the Yandex structured data validator API

I have been writing examples of LRMI metadata for schema.org. Of course I want these to be valid, so I have been hitting various online validators quite frequently. This was getting tedious. Fortunately, the Yandex structured data validator has an API, so I could write a python script to automate the testing. Here it is … Continue reading Checking schema.org data with the Yandex structured data validator API

I have been writing examples of LRMI metadata for schema.org. Of course I want these to be valid, so I have been hitting various online validators quite frequently. This was getting tedious. Fortunately, the Yandex structured data validator has an API, so I could write a python script to automate the testing.

Here it is

#!/usr/bin/python
import httplib, urllib, json, sys 
from html2text import html2text
from sys import argv

noerror = False

def errhunt(key, responses):                 # a key and a dictionary,  
    print "Checking %s object" % key         # we're going on an err hunt
    if (responses[key]):
        for s in responses[key]:             
            for object_key in s.keys(): 
                if (object_key == "@error"):              
                    print "Errors in ", key
                    for error in s['@error']:
                        print "tError code:    ", error['error_code'][0]
                        print "tError message: ", html2text(error['message'][0]).replace('n',' ')
                        noerror = False
                elif (s[object_key] != ''):
                    errhunt(object_key, s)
                else:
                    print "No errors in %s object" % key
    else:
        print "No %s objects" % key

try:
    script, file_name = argv 
except:
    print "tError: Missing argument, name of file to check.ntUsage: yandexvalidator.py filename"
    sys.exit(0)

try:
    file = open( file_name, 'r' )
except:
    print "tError: Could not open file ", file_name, " to read"
    sys.exit(0)

content = file.read()

try:
    validator_url = "validator-api.semweb.yandex.ru"
    key = "12345-1234-1234-1234-123456789abc"
    params = urllib.urlencode({'apikey': key, 
                               'lang': 'en', 
                               'pretty': 'true', 
                               'only_errors': 'true' 
                             })
    validator_path = "/v1.0/document_parser?"+params
    headers = {"Content-type": "application/x-www-form-urlencoded",
               "Accept": "*/*"}
    validator_connection = httplib.HTTPSConnection( validator_url )
except:
    print "tError: something went wrong connecting to the Yandex validator." 

try:
    validator_connection.request("POST", validator_path, content, headers)
    response = validator_connection.getresponse()
    if (response.status == 204):
        noerror= True
        response_data = response.read()   # to clear for next connection
    else:
        response_data = json.load(response)
    validator_connection.close()
except:
    print "tError: something went wrong getting data from the Yandex validator by API." 
    print "tcontent:n", content
    print "tresponse: ", response.read()
    print "tstatus: ", response.status
    print "tmessage: ", response.msg 
    print "treason: ", response.reason 
    print "n"

    raise
    sys.exit(0)

if noerror :
    print "No errors found."
else:
    for k in response_data.keys():
        errhunt(k, response_data)

Usage:

$ ./yandexvalidator.py test.html
No errors found.
$ ./yandexvalidator.py test2.html
Checking json-ld object
No json-ld objects
Checking rdfa object
No rdfa objects
Checking id object
No id objects
Checking microformat object
No microformat objects
Checking microdata object
Checking http://schema.org/audience object
Checking http://schema.org/educationalAlignment object
Checking http://schema.org/video object
Errors in  http://schema.org/video
	Error code:     missing_empty
	Error message:  WARNING: Не выполнено обязательное условие для передачи данных в Яндекс.Видео: **isFamilyFriendly** field missing or empty  
	Error code:     missing_empty
	Error message:  WARNING: Не выполнено обязательное условие для передачи данных в Яндекс.Видео: **thumbnail** field missing or empty  
$ 

Points to note:

  • I’m no software engineer. I’ve tested this against valid and invalid files. You’re welcome to use this, but it might not work for you. (You’ll need your own API key). If you see something needs fixing, drop me a line.
  • Line 51: has to be an HTTPS connection.
  • Line 58: we ask for errors only (at line 46) so no news is good news.
  • The function errhunt does most of the work, recursively.

The response from the API is a json object (and json objects are converted into python dictionary objects by line 62), the keys of which are the “id” you sent and each of the structured data formats that are checked. For each of these there is an array/list of objects, and these objects are either simple key-value pairs or the value may be an array/list of objects. If there is an error in any of the objects, the value for the key “@error” gives the details, as a list of Error_code and Error_message key-value pairs. errhunt iterates and recurses through these lists of objects with embedded lists of objects.

Crusade for Big Data Keynote

Today I gave the keynote presentation (slides below) at the Crusade for Big Data in the AAL domain workshop as part of the EU Ambient Assisted Living Forum. I gave an overview of the way that the Open PHACTS project has overcome various Big Data challenges to provide a production quality data integration platform that is […]

Today I gave the keynote presentation (slides below) at the Crusade for Big Data in the AAL domain workshop as part of the EU Ambient Assisted Living Forum. I gave an overview of the way that the Open PHACTS project has overcome various Big Data challenges to provide a production quality data integration platform that is being used to answer real pharmacology business questions.

The workshop then broke out into five breakout groups to discuss open challenges facing the AAL community that are posed by Big Data. The breakout groups were:

  1. Privacy and Ethics
  2. Business models for sustainability
  3. Data reuse and interoperability
  4. Data quality
  5. Feedback to the users

The organisers of the workshop (Femke Ongenae and Femke De Backere) will be sharing the outcomes of the brainstorming by proposing several working groups to focus on the issues in the area of AAL.

LRMI examples for schema.org

It’s been over two years since the LRMI properties were added to schema.org. One thing that we should have done much sooner is to create simple examples of how they can be used for the schema.org website (see, for example, the bottom of the Creative Work page). We’re nearly there. We have two examples in … Continue reading LRMI examples for schema.org

It’s been over two years since the LRMI properties were added to schema.org. One thing that we should have done much sooner is to create simple examples of how they can be used for the schema.org website (see, for example, the bottom of the Creative Work page). We’re nearly there.

We have two examples in the final stages of preparation, so close to ready that you can see previews of what we propose to add (at the bottom of that page).

The first example is very simple, just a few lines describing a lesson plan for US second grade teachers (NB, the lesson plan itself is not included in the example):

<div>
  <h1>Designing a treasure map</h1>
  <p>Resource type: lesson plan, learning activity</p>
  <p>Target audience: teachers</p>
  <p>Educational level: US Grade 2</p>
  <p>Location: <a href="http://example.org/lessonplan">http://example.org/lessonplan</a></p>
</div>

With added microdata that becomes

<div itemscope itemtype="http://schema.org/CreativeWork">
    <h1 itemprop="name">Designing a treasure map</h1>
    <p>Resource type: 
      <span itemprop="learningResourceType">lesson plan</span>, 
      <span itemprop="learningResourceType">learning activity</span>
    </p>
    <p>Target audience: 
      <span itemprop="audience" itemscope itemtype="http://schema.org/EducationalAudience">
        <span itemprop="educationalRole">teacher</span></span>s.
    </p>
    <p itemprop="educationalAlignment" itemscope itemtype="http://schema.org/AlignmentObject">
        <span itemprop="alignmentType">Educational level</span>: 
        <span itemprop="educationalFramework">US Grade Levels</span> 
        <span itemprop="targetName">2</span>
        <link itemprop="targetUrl" href="http://purl.org/ASN/scheme/ASNEducationLevel/2" />
    </p>
    <p>Location: <a itemprop="url" href="http://example.org/lessonplan">http://example.org/lessonplan</a></p>
</div>

(Other flavours of schema.org markup are at the bottom of the AlignmentObject preview.)

This is illustrates a few points:

  • free text learning resource types, which can be repeated
  • the audience for the resource (teachers) is different from the grade level of the end users (pupils)
  • educationalRole is a property of EducationalAudience, not the CreativeWork being described
  • how the AlignmentObject should be used to specify the grade level appropriateness of the resource
  • human readable grade level information is supplemented with a machine readable URI as the targetUrl

The second example is more substantial. It is based on a resource from BBC bitesize (though we have hacked the HTML around a bit). Here’s the HTML

<div>
    <h1>The Declaration of Arbroath</h1>
    <p>A lesson plan for teachers with associated video. 
       Typical length of lesson, 1 hour. 
       Recommended for children aged 10-12 years old.
    </p>
    <p>Subject: Wars of Scottish independence</p>
    <p>Alignment to curriculum:</p>
    <ul>
        <li>England 
            National Curriculum: KS 3 History: The middle ages (12th to 15th century)
        </li>
        <li>Scotland 
            SCQF: Level 2
            Curriculum for Excellence: Social studies: people past events and societies: The Wars of Independence
        </li>
    </ul>
    <p>Location: <a href="http://example.org/lessonplan">http://example.org/lessonplan</a></p>
    <video>
        <source src="http://example.org/movie.mp4" type="video/mp4" />
        Duration 03:12
    </video>
</div>

and here’s the JSON-LD:

<script type="application/ld+json">
{
  "@context":  "http://schema.org/",
  "@type": "WebPage",
  "name": "The Declaration of Arbroath",
  "about": "Wars of Scottish independence",
  "learningResourceType": "lesson plan",
  "timeRequired": "1 hour",
  "typicalAgeRange": "10-12",
  "audience": {
      "@type": "EducationalAudience",
      "educationalRole": "teacher"
  },
  "educationalAlignment": [
    {
      "@type": "AlignmentObject",
      "alignmentType": "educationalSubject",
      "educationalFramework": " Curriculum for Excellence: ",
      "targetName": "Social studies: people past events and societies",
      "targetUrl": "http://example.org/CFE/subjects/3362"      
    },
    {
      "@type": "AlignmentObject",
      "alignmentType": "educationalLevel",
      "educationalFramework": "SCQF",
      "targetName": "Level 2",
      "targetUrl":  "http://example.org/SCQF/levels/2"      
    },
    {
      "@type": "AlignmentObject",
      "alignmentType": "educationalLevel",
      "educationalFramework": "National Curriculum",
      "targetName": "KS 3",
      "targetUrl": "http://example.org/ENC/levels/KS3"
    },
    {
      "@type": "AlignmentObject",
      "alignmentType": "educationalSubject",
      "educationalFramework": "National Curriculum",
      "targetName": "History: The middle ages (12th to 15th century)",
      "targetUrl" : "http://example.org/ENC/subjects/3102"
    }
  ],
  "url" : "http://example.org/lessonplan",
  "video": {
    "@type": "VideoObject",
    "description": "Video description",
    "duration": "03:12",
    "name": "Video Title",
    "thumbnailUrl": "http://example.org/thubnail.mp4",
    "uploadDate": "2000-01-01",
    "url" : "http://example.org/movie.mp4"
  }
}
</script>

(Again other flavours of schema.org markup are at the bottom of the AlignmentObject preview.)

The additional points to note here are that

  • we chose to mark up the lesson plan as the learning resource rather than the video. Nothing wrong with marking up the video as a learning resource, but things like educational alignment will be more explicit for lesson plans. (Again, neither the lesson plan nor the video are in the example)
  • the typical learning time of the lesson plan is not the same as the duration of the video
  • this example treat the English and Scottish curricula as providing subjects for study at given educational levels. It would be possible to go deeper and align to specific competence statements (e.g. say that the alignment is that the resource “teaches” Curriculum for Excellence outcome SOC 2-01a “I can use primary and secondary sources selectively to research events in the past.”
  • the educational subject (i.e. the educational context of the resource in terms of those subjects studied at school) is different from the topic that the resource is about
  • it’s a shame that there are no real URIs to use for the targets of these aligments
  • while you can omit the url property for the described resource, we thought it safer to include it; the default value is the URI of the webpage in which the markup is included, which may be what you need, but in the case of stand-alone JSON-LD you’re lost without an explicit value for “url”

As ever, comments would be welcome.

Open PHACTS wins European Linked Data Award

We are delighted to announce that Open PHACTS has been awarded first place in the Linked Open Data Award of the inaugural European Linked Data Contest (ELDC). An international jury of ambassadors from over 15 European countries elected Open PHACTS as the winner, judged by the following criteria: Shows a high degree of innovation triggers […]

We are delighted to announce that Open PHACTS has been awarded first place in the Linked Open Data Award of the inaugural European Linked Data Contest (ELDC). An international jury of ambassadors from over 15 European countries elected Open PHACTS as the winner, judged by the following criteria:

Gerhard receiving ELDC award

Gerhard receiving ELDC award

  • Shows a high degree of innovation
  • triggers network effects
  • embraces open standards
  • proves technological matureness
  • shows great potential to be utilised in multiple domains
  • achieves a high degree of comprehensibility for the users

The ELDC has been established to recognise Europe’s crème de la crème of linked data and semantic web. Prizes are awarded to stories, products, projects or persons presenting novel and innovative projects, products and industry implementations involving linked data. The ELDC also aims to build a directory of the best European projects in the domains of linked data and the semantic web. Open PHACTS is honored to be chosen as the first winner of the ELDC’s Linked Open Data Award, and to be included in this directory.

Mobile computing

Ages ago I had my raspberry Pi set up to run headless and wireless, and had built a motor driver based on an H-bridge, all ready for it to become mobile. In fact it did become mobile by canabilising a remote control dalek toy of my son’s, but that didn’t work too well (it didn’t … Continue reading Mobile computing

Ages ago I had my raspberry Pi set up to run headless and wireless, and had built a motor driver based on an H-bridge, all ready for it to become mobile. In fact it did become mobile by canabilising a remote control dalek toy of my son’s, but that didn’t work too well (it didn’t all fit in). A short while ago I saw a video from @museumsinspire of some Pi-based moving robots which inspired me to try again.

@winkleink pointed me to the parts required–just £8 for the bits I didn’t already have, and after a few weeks wait while they ship from China:2015-09-05 12.23.16
No instructions, which I like: it saves all those “why don’t you read the instructions” arguments. After some help from my daughter2015-09-05 12.52.31

2015-09-05 18.50.40 Soldering connectors to the motors was a real pain. Also, the chassis came with a 4xAA battery, but it was a bit bulky and I fancied 9v for two motors. However that PP3 doesn’t last long.

The pi and its power pack fit no the top in various configurations, I’m still not sure what works best. In the long run I guess I’ll drill some holes so they can be held more securely (sadly but not surprisingly, none of the existing holes are in the right place).
2015-09-06 12.14.39 (The motor controller is in a wee box hidden on the other side of the pi.

I use the GPIO utility that comes with WiringPi for quick tests of the motor driver, to work out which GPIO pin send which wheel forwards or backwards, and also to write short shell scripts such as

#!/bin/bash
#go forwards
gpio mode 0 out
gpio mode 1 out
gpio mode 2 out
gpio mode 3 out
gpio write 1 0 && gpio write 2 0
gpio write 0 1 && gpio write 3 1

to put this little pi bot through its paces

Not a bad start. The battery was running low, it was nippier than that at first. Good enough to build on, I think. Once I’ve worked out where best to put all the parts, and got them fixed a little more securely, there are some other enhancements I would like to make. First, the wheels came with optical encoder discs but no sensors for them, so there is scope for some more precise control of positioning. Secondly there’s a pi camera module on that pi, so some scope for a seeing robot, and it would be nice to add other senses as well.

Data Integration in a Big Data Context

Today I had the pleasure of visiting the Urban Big Data Centre (UDBC) to give a seminar on Data Integration in a Big Data context (slides below). The idea for the seminar came about due to my collaboration with Nick Bailey (Associate Director of the UBDC) in the Administrative Research Data Centre for Scotland (ADRC-S). In […]

Today I had the pleasure of visiting the Urban Big Data Centre (UDBC) to give a seminar on Data Integration in a Big Data context (slides below). The idea for the seminar came about due to my collaboration with Nick Bailey (Associate Director of the UBDC) in the Administrative Research Data Centre for Scotland (ADRC-S).

In the seminar I wanted to highlight the challenges of data integration that arise in a Big Data context and show examples from my past work that would be relevant to those in the UBDC. In the presentation, I argue that RDF provides a good approach for data integration but it does not solve the basic challenges of messy data and generating mappings between datasets. It does however lay these challenges bare on the table, as Frank van Harmelen highlighted in his SWAT4LS keynote in 2013.

The first use case is drawn from my work on the EU SemSorGrid4Env project where we were developing an integrated view for emergency response planning. The particular use case shown is that of coastal flooding on the south coast of England. Although this project finished in 2011, I am still involved with developing RDF and SPARQL continuous data extensions; see the W3C RDF Stream Processing Community Group for details.

The second use case is drawn from my work on the EU Open PHACTS project. I showed the approach we developed for supporting user controlled views of the integrated data through Scientific Lenses. However, I also talked about the successes of the project and the fact that is currently being actively used for pharmacology research and receiving over 20million hits a month.

I finished the talk with an overview of the Administrative Data Research Centre for Scotland (ADRC-S) and my work on linking birth, marriage, and death records. I am hoping that we can adopt the lenses approach together with incorporating feedback on the linkages from the researchers who will use the integrated views.

In the discussions following the talk, the notion of FAIR data came up. This is the idea that data should be Findable, Accessible, Interoperable, and Reusable by both humans and machines. RDF is one approach that could lead to this. The other area of discussion was around community initiatives for converting existing open datasets into an RDF format. I advocated adopting the approach followed by the Bio2RDF community who share the tasks of creating and maintaining such scripts for biological datasets. An important part of this jigsaw is tracking the provenance of the datasets, for which the W3C Health Care and Life Sciences Community Profile for Dataset Descriptions could be beneficial (there is nothing specific to the HCLS community in the profile).

A short project on linking course data

Alasdair Gray and I have had Anna Grant working with us for the last 12 weeks on an Equate Scotland Technology Placement project looking at how we can represent course information as linked data. As I wrote at the beginning of the project, for me this was of interest in relation to work on the use of schema.org … Continue reading A short project on linking course data

Alasdair Gray and I have had Anna Grant working with us for the last 12 weeks on an Equate Scotland Technology Placement project looking at how we can represent course information as linked data. As I wrote at the beginning of the project, for me this was of interest in relation to work on the use of schema.org to describe courses; for the department as a whole it relates to how we can access course-related information internally to view information such as the articulation of related learning outcomes from courses at different stages of a programme, and how data could be published and linked to datasets from other organisations such as accrediting bodies or funders. We avoided any student-related data such as enrolments and grades. The objectives for Anna’s work were ambitious: survey existing HE open data and ontologies in use; design an ontology that we can use; develop an interface we can use to create and publish our course data. Anna made great progress on all three fronts. Most of what follows is lifted from her report.

(Aside: at HW we run 4-year programmes in computer science which are composed of courses; I know many other institution run 3/4-year courses which are comprised of modules. Talking more generally, course is usefully ambiguous to cover both levels of granularity; programme and module seem unambiguous.)

A few Universities have already embarked on a similar projects, notably the Open University, Oxford University and Southampton University in the UK, and Muenster and the American University of Beirut elsewhere. Southampton was one of the first Universities to take the open linked data approached and as such they developed their own bespoke ontology. Oxford has predominantly used the XCRI ontology (see below for information on the standard education ontologies mentioned here) to represent data, additionally they have used MLO, dcterms, skos and a few resource types that they have defined in their own ontology. The Open University has the richest data available, the approach they took was to use many ontologies. Muenster developed the TEACH ontology, and the American University of Beirut used the CourseWare and AIISO ontologies.

The ontologies reviewed were: AIISO, Teach, CourseWare, XCRI, MLO, ECIM and CEDS. A live working draft of the summary / review for these is available for comment as a Google Doc.

Aiiso (Academic Institution Internal Structure Ontology) is an excellent ontology for what it is designed for but as it says, it aims to describe the structure of an institution and doesn’t offer a huge amount in the way of particular properties of a course. Teach is a better fit in terms of having the kind of properties that we wished to use to describe a course, however doesn’t give any kind of representation of the provider of the course. CourseWare is a simple ontology with only four classes and many properties with with Course as the domain, the trouble with this ontology is that it is closely related to the Aktors ontology which is no longer defined anywhere online.

XCRI and MLO are designed for the advertising of courses and as such they miss out some of the features of a course that would be represented in internal course descriptions such as assessment method and learning outcomes.  Neither of these ontologies show the difference between a programme and a module. ECIM is an extension of MLO which provides a common format for representing credits awarded for completion of a learning opportunity.

CEDS (Common Education Data Standards) is an American ontology which provides a shared vocabulary for educational data from preschool right up to adult education.  The benefits of which are, that data can be compared and exchanged in a consistent way.  It has data domains for assessment, learning standards, learning resources, authentication and authorisation.  Additionally it provides domains for different stages of education e.g. post-secondary education. CEDS is ambitious in that it represents all levels of education and as such is a very complex and detailed ontology.

XCRI, MLO (+ ECIM) and CEDS can be grouped together in that they differentiate between a course specification and a course instance, offering or section.  The specification being the parts of a course that remain consistent from one presentation to the next, whereas the instance defines those aspects of a course that vary between presentations for example location or start date. The advantage of this is that there will be a smaller amount of data that will require updating between years/offerings.

An initial draft of a Heriot Watt schema applying all the ontologies available was made. It was a mess, however it became apparent the MLO was the predominant ontology.  So we chose to use MLO where possible and then use other ontologies where required.  This iteration  resulted in a course instance becoming both an MLO learning opportunity instance and a TEACH course in order to be able to use all the properties required.  Even using this mix of ontologies we still needed to mint our own terms.  This approach was a bit complex and TEACH does not seem to be widely used, we therefore decided to use MLO alone and extend it to fit our data in a similar way that already started by ECIM.

The final draft is shown below. Key:  Green= MLO, Purple=MLO extension, Blue=ECIM / previous alteration to MLO Yellow= generic ontologies such as Dublin core and SKOS.  In brief, we used subtypes of MLO Learning Opportunities to describe both programmes and modules. The distinction between information that is at the course specification level and that which is at the course instance level was made on the basis of whether changing the information required committee approval. So things that can be changed year on year without approval such as location, course leader and other teaching staff are associated with course instance; things that are more stable and require approval such as syllabus, learning outcomes, assessment methods are at course specification level.

mloExtension2

We also created some instance data for Computer Science courses at Heriot-Watt. For this we use Semantic MediaWiki (with the Semantic bundle). Semantic forms were used for inputting course information, the input from the forms is then shown as a wiki page. Categories in mediawiki are akin to classes, properties are used to link one page to another and also to relate the subject of the page to its associated literals. An input form has the properties inbuilt such that each field in the form has a property related to it. Essentially the item described by the form will become the object in the stored triple, the property associated with a field within the form will form the predicate of the stored triple and the input to the field will form the subject of a triple. A field can be set such that multiple values can be entered if separated by commas, and in this case a triple will be formed for each value.  I think there is a useful piece of work that could be done comparing various tools for creating linked data (e.g. Semantic MediaWiki, Calimachus, Marmotta) and evaluating whether other approaches (e.g. WordPress extensions) may improve on them. If you know of anything along such lines please let me know.

We have little more work to do in representing the ontology in Protege and creating more instance data, watch this space for updates and a more detailed description than the image above. We would also like to evaluate the ontology more fully against potential use cases and against other institutions data.

Anna has finished her work here now and returns to Edinburgh Napier University to finish her Master’s project. Alasdair and I think she has done a really impressive job, not least considering she had no previous experience with RDF and semantic technologies. We’ve also found her a pleasure to work with and would like to thank her for her efforts on this project.

Technology enhanced learning in HW MACS

At the beginning of the summer I was handed an internal project convening a group to put into action the strategic plan for the use of technology to enhance teaching and learning at the School of Mathematical and Computer Sciences (MACS) at Heriot-Watt. Yesterday the first phase of that came to fruition in a show and tell … Continue reading Technology enhanced learning in HW MACS

At the beginning of the summer I was handed an internal project convening a group to put into action the strategic plan for the use of technology to enhance teaching and learning at the School of Mathematical and Computer Sciences (MACS) at Heriot-Watt. Yesterday the first phase of that came to fruition in a show and tell seminar.

Our initial approach has been to identify and share what is already happening within the School, to try to make those open up those pockets of innovation where one or two people are doing something that many others could also try. We started with a survey, asking people to tell us about interesting uses of technology that they had tried. We left the individual respondents to define what they felt as “interesting” but did our best to encourage anyone who was going beyond basic PowerPoint and VLE use to let us know. We also asked what ideas for use of technology staff wanted to try in their teaching, and what (if anything) was stopping them from doing so. We had a decent number of replies, and from these identified some common themes from the two questions (i.e. there was overlap between what some people had tried and what other people were interested in trying, a bit of a win that deserves more than a parenthesis at the end of a sentence). Those themes were:

  1. managing course work in the VLE, e.g. setting up rubrics, delegated marking.
  2. online assessment
  3. promoting interaction in class
  4. use of video for short explanations, demonstrations etc.

We used this to decide what we had to include in the first event yesterday, which we billed as a show and tell seminar. We had three speakers on each of the first three theme listed (it seemed that explanations and demos of how to use video for explanations and demos could be done as videos :-) ). The speakers had 10 minutes each to show what they had done, so really just enough time to provide a taster so that others could decide whether they wanted to know more. So that’s 9 speakers in 90 minutes, it was a real challenge to come up with a format which didn’t last too long (because academics are always busy, we couldn’t expect to get much more than an hour or two of their time) but did cover a wide range of topics.  We had an audience of 40, which is pretty good–I’ve been to plenty of similar events covering a whole institution which have had lower turnouts. The feedback has been that not everything interested everyone, there was something for everyone that made the time committed as a whole worthwhile. So on the whole I think the format worked, even though it was hectic.

Of course by packing so much into a compressed schedule few people will have been able to get all the information they wanted and so follow-up will be crucial to the success of this work as a whole. The show and tell was a kick-off event, we want to encourage people to continue to share the ideas that are of interest to them and for there to be other, smaller, events with a tighter focus to facilitate this.

WordPress as a semantic web platform?

For the work we’ve been doing on semantic description of courses we needed a platform for creating and editing instance data flexibly and easily. We looked at callimachus and Semantic MediaWiki; in the end we went with the latter because of JAVA version incompatibility problems with the other, but it has been a bit of a struggle. I’ve … Continue reading WordPress as a semantic web platform?

For the work we’ve been doing on semantic description of courses we needed a platform for creating and editing instance data flexibly and easily. We looked at callimachus and Semantic MediaWiki; in the end we went with the latter because of JAVA version incompatibility problems with the other, but it has been a bit of a struggle. I’ve used WordPress for publishing information about resources on a couple of projects, for Cetis publications and for learning resources, and have been very happy with it. WordPress handles the general task of publishing stuff on the web really well, it is easily extensible through plugins and themes, I have nearly always found plugins that allow me to do what I want and themes that with a little customization allow me to present the information how I want. As a piece of open source software it is used on a massive scale (about a quarter of all web domains use it) and has the development effort and user support to match. For the previous projects my approach was to have a post for each resource I wanted to describe and to set the title, publication date and author to be those for the resource, I used the main body of the post for a description and used tags and categories for classification, e.g. by topic or resource type; other metadata could be added using WordPress’s Custom Fields, more or less as free text name-value pairs. While I had modified themes so that  the semantics of some of this information was marked up with microdata or RDFa embedded in the HTML, I was aware that WordPress allowed for more than I was doing.

The possibility of using WordPress for creating and publishing semantic data hinges on two capabilities that I hadn’t used before: firstly the ability to create custom post types so that for each resource type there can be a corresponding post type; secondly the ability to create custom metadata fields that go beyond free text. I used these in conjunction with a theme which is a child theme of the current default, TwentyFifteen, which sets up the required custom types and displays them. Because I am familiar with it and it is quite general purpose, I chose the schema.org ontology to implement, but I think the ideas in this post would be applicable to any RDF vocabulary. When creating examples I had in mind a dataset describing the books that I own and the authors I am interested in.

I started by using a plugin, Custom Post Type UI, to create the post types I wanted but eventually was doing enough in php as a theme extensions (see below) that it made sense just to add a function to create the the post types. This drops the dependency on the plugin (though it’s a good one) and means the theme works from the outset without requiring custom types to be set up manually.

add_action( 'init', 'create_creativework_type' );
function create_creativework_type() {
  register_post_type( 'creativework',
    array(
      'labels' => array(
        'name' => __( 'Creative Works' ),
        'singular_name' => __( 'Creative Work' )
      ),
      'public' => true,
      'has_archive' => true,
      'rewrite' => array('slug' => 'creativework'),
      'supports' => array('title', 'thumbnail', 'revisions' )
    )
  );
}

The key call here is to the WP function register_post_type() which is used to create a post type with the same name as the schema.org resource type / class; so I have one of these for each of the schema.org types I use (so far Thing, CreativeWork, Book and Person). This is hooked into the WordPress init process so it is done by the time you need those post types.

I do use a plugin to help create the custom metadata fields for every property except the name property (for which I use the title of the post). Meta Box extends the WordPress API with some functions that make creating metadata fields in php much easier. These metadata fields can be tailored for particular data types, e.g. text, dates, numbers, urls and, crucially, links to other posts. That last one gives you what you need for to create relationships between the resources you describe in WordPress, which can expressed as triples. Several of these custom fields can be grouped together into a “meta box” and attached as a group to specific post types so that they are displayed when editing posts of those types. Here’s what declaring a custom metadata field for the author relationship between a CreativeWork and a Person looks like with MetaBox (for simplicity I’ve omitted the code I have for declaring the other properties of a Creative Work and some of the optional parameters). I’m using the author property as an example because a repeatable link to another resource is about as complicated a property as you get.

function semwp_register_creativework_meta_boxes( $meta_boxes )
{
    $prefix = 'semwp_creativework_';

    // 1st meta box
    $meta_boxes[] = array(
        'id'         => 'main_creativework_info',
        'title'      => __( 'Main properties of a schema.org Creative Work', 'semwp_creativework_' ),
        // attach this box to the following post types
        'post_types' => array('creativework', 'book' ),

	// List of meta fields
	'fields'     => array(
            // Author
            // Link to posts of type Person.
            array(
                'name'        => __( 'Author (person)', 'semwp_creativework_' ),
                'id'          => "{$prefix}authors",
                'type'        => 'post',
                'post_type'   => 'person',
                'placeholder' => __( 'Select an Item', 'semwp_creativework_' ),
            // set clone to true for repeatable fields
            'clone' => true
            ),
        ),
    );
    return $meta_boxes;
}

What this gives when editing a post of type book is this:

semwpeditshot

WordPress uses a series of nested templates to display content, which are defined in the theme and can either be specific to a post type or generic, the generic ones being used as a fall back if a more specific one does not exist. As I mentioned I use a child theme of TwentyFifteen which means that I only have to include those files that I change from the parent. To display the main content of posts of type book I need a file called content-book.php (the rest of the page is common to all types of post), which looks like this


<article resource="?<?php the_ID() ; ?>#id" id="?<?php the_ID(); ?>" <?php post_class(); ?> vocab="http://schema.org/" typeof="Book">

<header class="entry-header">
    <?php
        if ( is_single() ) :
            the_title( '<h1 class="entry-title" property="name">', '</h1>' );
        else :
            the_title( sprintf( '<h2 class="entry-title"><;a href="%s" rel="bookmark">', esc_url( get_permalink() ) ), '</a></h2>;' );
        endif;
    ?>
</header>
<div class="entry-content">
    <?php semwp_print_creativework_author(); ?>
    <?php semwp_print_book_bookEdition(); ?>
    <?php semwp_print_book_numberOfPages(); ?>
    <?php semwp_print_book_isbn(); ?>
    <?php semwp_print_book_illustrator(); ?>
    <?php semwp_print_creativework_datePublished(); ?>
    <?php semwp_print_book_bookFormat(); ?>
    <?php semwp_print_creativework_sameAs(); ?></div>
<footer class="entry-footer">
    <?php twentyfifteen_entry_meta(); ?>
    <?php edit_post_link( __( 'Edit', 'twentyfifteen' ), '<span class="edit-link">', '</span>' ); ?>
    <?php semwp_print_extract_rdf_links(); ?>
</footer>

</article>

Note the RDFa in some of the html tags, for example the <article> tag includes

resource= [url]#id vocab="http://schema.org/" typeof="Book"

and the title is output in an <h1> tag with the

property="name"

attribute. Exposing semantic data as RDFa is one (good) thing, but what about other formats? A useful web service called RDF Translator helps here. It has an API which allowed me to put a link at the foot of each resource page to the semantic data from that page in formats such as RDF/XML, N3 and JSON-LD; it’s quite not what you would want for fully fledged semantic data publishing but it does show the different views of the data that can be extracted from what is published.

Also note that most of the content is printed through calls to php functions that I defined for each property, semwp_print_creativework_author() looks like this (again a repeatable link to another resource is about as complex as it gets:

function semwp_print_alink($id) {
     if (get_the_title($id))       //it's a object with a title
     {
         echo sprintf('<a property="url" href="%s"><span property="name">%s</span></a>', esc_url(get_permalink($id)), get_the_title($id) );
     }
     else                          //treat it as a url
     {
         echo sprintf('<a href="%s">%s</a>', esc_url($id), $id );
     }
}
function semwp_print_creativework_author()
{
    if ( rwmb_meta( 'semwp_creativework_authors' ) )
    {
	echo '

By: ';
	$authors = rwmb_meta( 'semwp_creativework_authors' );
        foreach ( $authors as $author )
        {
               echo '<span property="author" typeof="Person">';
               semwp_print_alink($author);
               echo '</span>';
        }
        echo '

';
    }
}

So in summary, for each resource type I have two files of php/html code: one which sets up a custom post type, custom metadata fields for the properties of that type (and any other types which inherit them) and includes some functions that facilitate the output of instance data as HTML with RDFa; and another file which is the WordPress template for presenting that data. Apart from a few generally useful functions related to output as HTML and modifications to other theme files (mostly to remove embedded data which I found distracting) that’s all that is required.

The result looks like this:

Note, this image is linked to the page on wordpress that is shows, click on it if you want to explore the little data that there is there, but please do be aware that it is a development site which won't always be working properly.
Note, this image is linked to the page on my WordPress install that it shows, click on it if you want to explore the little data that there is there, but please do be aware that it is a development site which won’t always be working properly.

And here’s the N3 rendering of the data in that page as converted by RDF Translator:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://www.pjjk.net/semanticwp/book/the-day-of-the-triffids> rdfa:usesVocabulary schema: .

<http://www.pjjk.net/semanticwp/book/the-day-of-the-triffids?36#id> a schema:Book ;
    schema:author [ a schema:Person ;
            schema:name "John Wyndham"@en-gb ;
            schema:url <http://www.pjjk.net/semanticwp/person/john-wyndham> ] ;
    schema:bookEdition "Popular penguins (2011)"@en-gb ;
    schema:bookFormat ""@en-gb ;
    schema:datePublished "2011-09-01"^^xsd:date ;
    schema:illustrator [ a schema:Person ;
            schema:name "John Griffiths"@en-gb ;
            schema:url <http://www.pjjk.net/semanticwp/person/john-griffiths> ] ;
    schema:isbn "0143566539"@en-gb ;
    schema:name "The day of the triffids"@en-gb ;
    schema:numberOfPages 256 ;
    schema:sameAs "http://www.amazon.co.uk/Day-Triffids-Popular-Penguins/dp/0143566539/"@en-gb,
        "https://books.google.co.uk/books?id=zlqAZwEACAAJ"@en-gb .

Further work: Ideas and a Problem

There’s a ton of other stuff that I can think of that could be done with this, from the simple, e.g. extend the range of types supported, to the challenging, e.g. exploring ways of importing data or facilitating / automating the creation of new post types from known ontologies, output in other formats, providing a SPARQL end point &c &c… Also, I suspect that much of what I have implemented in a theme would be better done as a plugin.

There is one big problem that I only vaguely see a way around, and that is illustrated above in the screenshot of the editing interface for the ‘about’ property. The schema.org/about property has an expected type of schema.org/Thing; schema.org types are hierarchical, which means the value for about can be a Thing or any subtype of Thing (which is to say of any type). This sort of thing isn’t unique to schema.org. However, the MetaBox plugin I use will only allow links to be made to posts of one specific type, and I suspect that reflects something about how WordPress organises posts of different custom types. I don’t think there is any way of asking it to show posts from a range of different types and I don’t think there is any way of saying that posts of type person are also of type thing and so on.  In practice this means that at the moment I can only enter data that shows books as being about unspecific Things; I cannot, for example, say that a biography is a book about a Person. I can only see clunky ways around this.
Update: I noticed that you can pass an array of post types so that selection can be made from any one of them.

[Aside: the big consumers of schema data (Google, Bing, Yahoo, Yandex) will also permit text values for most properties and try to make what sense of it they can, so you could say that for any property either a string literal or a link to another resource should be permitted. This, I think, is a peculiarity of schema.org. The screenshot above of the data input form shows that the about field is repeated to provide the option of a text-only value, an approach hinting at one of the clunky unscalable solutions to the general problem described above.]

What next? I might set a student project around addressing some of these extensions. If you know a way around the selecting different type problem please do drop me a line. Other than that I can see myself extending this work slowly if it proves useful for other stuff, like creating examples of pages with schema.org or LRMI data in them. If anyone is really interested in the source code I could put it on github.

Update 02 Sep 2015:

I refactored the code so that most of the new php for creating new custom post types and setting up the forms to edit their properties is in plugin, and all the theme does is display the data entered with embedded RDFa.

The code is now on GitHub.

I did set a student project around extending it, waiting to see if any student opts for it.

Using Garmin eTrex Vista HCx with Ubuntu 14.04LTS & QLandkarte GT

I have a rather old Garmin GPS eTrex that I use for GPS on walking holidays and cycle rides. I use it with OpenCycleMap contour maps downloaded from talkytoaster. To plan routes and manage the routes, tracks and maps on Ubuntu I use QLandkarte GT.  This summer was the first time I used this combination on … Continue reading Using Garmin eTrex Vista HCx with Ubuntu 14.04LTS & QLandkarte GT

garminI have a rather old Garmin GPS eTrex that I use for GPS on walking holidays and cycle rides. I use it with OpenCycleMap contour maps downloaded from talkytoaster. To plan routes and manage the routes, tracks and maps on Ubuntu I use QLandkarte GT.  This summer was the first time I used this combination on my new PC, and I found some of the config difficult because the info I could find (e.g. this from GPS babel) related to old versions of Ubuntu (not surprising, this garmin is from the Ubuntu Feisty era). What needs doing seems similar but how you do it has changed.

I edited /etc/modprobe.d/blacklist to stop Ubuntu loading the garmin_gps module.  I don’t know if this is necessary, but everything I want seems to work with it there. That file now looks like:

# stop garmin_gps serial from loading for USB garmin devices

blacklist garmin_gps

The to make sure that the Garmin is automounted r/w for all users when plugged in to a USB post I created /etc/udev/rules.d/51-garmin.rules , with the content

SUBSYSTEM=="usb", ATTR{idVendor}=="091e", MODE="0666", GROUP="plugdev"

I found the lsusb and the gpsbabel utility  useful in testing the connexion. With it installed and the etrex plugged in I now see

phil@shuttle$ lsusb
Bus 002 Device 002: ID 8087:8001 Intel Corp. 
[...]
Bus 003 Device 004: ID 091e:0003 Garmin International GPS (various models)

phil@shuttle$ gpsbabel -i garmin -f usb:-1
0 3834401962 694 eTrex Vista HCx Software Version 3.40

And then in QLandkarte I can go to setup | general and under the “device and xfer” select Garmin in the main drop-down and EtrexVistaHCx in the Device Type (other Device options left blank) and happily transfer routes and tracks between the PC and the GPS.

Screenshot from 2015-08-07 09:11:51