Back to my home page; to my teaching page.

Data Mining and Machine Learning

General Notes for 2015 session

Here is the timetable of lectures and coursework information for 2015

Courseworks A, B,  [must do and pass, but no marks]  -- Courseworks 1 (30%), 2 (40%), 3 (30%)

Your lecturer is:

  Professor David Corne,

  Office hour:     Thursdays:  4:15pm—5:15pm

 A blue box means a lecture. An empty box means NO LECTURE. However these slots may be used from time to time to cover cancellations, guest lectures, or something else. Of course, I’ll let you know well in advance.

All course material is below, including CWs, BEYOND WHATEVER TODAY’S DATE IS, may change a little.




Thursday 13:15 EM183

Friday 12:15 WP108

Coursework handout

Coursework handin



w/b Mon 14th  Sep

  DMML intro

basics 1

C/W A   




w/b Mon 21st Sep 






w/b Mon 28th Sep

  More basics & CW1

correlation / regression 

C/W 1




w/b Mon 5th Oct


Naïve Bayes

C/W 2

 C/W A

Fri 9th Oct  23:59pm





w/b Mon 12th Oct

 Feature selection

Decision Trees



Fri 16th   Oct 23:59pm



w/b Mon 19th Oct

Classifier Ensembles

Top10DM / Overfitting / Neural Nets / SVMs

C/W 3




w/b Mon 26th Oct


Rules (and ROC curves)

C/W 1

Sun 1st  Nov  23:59pm



w/b Mon 2nd Nov





w/b Mon 9th Nov






w/b Mon 16th Nov

The a priori algorithm

Introduction to Deep Learning





w/b Mon 23rd Nov


Text as data


C/W 2

Sun 29th  Nov 23:59pm



w/b Mon 30th Nov

to be used if necessary



C/W 3

Sun 6th Dec 23:59pm


Where the timeslot is empty, that means there will NOT be a lecture.  However it would be best if you could keep those slots free – I may have to use the Friday slot if and when something comes up and I have to cancel the Thursday slot. Obviously I will try to let you know as often as possible in advance.  Also occasional Friday slots might be used for brief coursework vivas.


The assessment of this module is entirely by coursework. My aim in setting the coursework is to get you to learn about the essentials of data mining by working on real data and real issues that myself and/or my colleagues are currently working on. You will implement and use techniques that are in fact quite straightforward, but are nevertheless very important, and in fact these techniques are often misused or not used at all (but of course they should be) by data analysts in science and industry.

Coursework Assignment 1: Worth 30%

Coursework Assignment 2: Worth 40%

Coursework Assignment 3: Worth 30%

Coursework Assignments A, B: Worth 0% -- but you fail the module if you don't submit an adequate response to each. For these courseworks, if I am not happy with what you submit, I will ask you to improve it and resubmit it. This will iterate if necessary until I think you have done it properly.

In all cases:  hand it in as a pdf attachment in an email to,  with subject line the appropriate one of these five:  DMML CW1,   DMML CW2, DMML CW3, DMML CWA, DMML CWB