Back to my home page; to my teaching page.

Data Mining and Machine Learning

General Notes for 2012 session

Here is the timetable of lectures and coursework information for 2012

Courseworks A, B,  [must do and pass, but no marks]  -- Courseworks 1 (30%), 2 (40%), 3 (30%)

Your lecturer is:

  Professor David Corne, http://www.macs.hw.ac.uk/~dwcorne/

  I want to arrange my office hour in a way that suits the majority of students I am teaching this semester, so please make your preferences known at this doodle poll  http://doodle.com/pe9p9e2f2d8bip7u

week

date

Thursday 13:15 EM183

Friday 12:15 WP108

Coursework handout

Coursework handin

 

1

w/b Mon 10th  Sep

  DMML intro

 basics 1

C/W A  

C/W B

 

 

2

w/b Mon 17th Sep 

  basics & CW1

  basics 3

 

 C/W 1

 

 

3

w/b Mon 24th Sep

 

  The a priori algorithm

 

 

4

w/b Mon 1st Oct

correlation / regression

 Feature selection

C/W 2

 C/W A

Friday  5th   23:59pm

 

 

 

5

w/b Mon 8th Oct

   Clustering

 

 

C/W B Friday  12th   23:59pm

 

6

w/b Mon 15th Oct

Naïve Bayes

C/W 1

Friday  19th   23:59pm 

 

7

w/b Mon 22nd Oct

Decision Trees

 

C/W 3

 

 

8

w/b Mon 29th Oct

Top10DM / Overfitting / Neural Nets / SVMs

 

 

C/W 2

Sunday 4th  November 23:59pm

 

9

w/b Mon 5th Nov

Classifier Ensembles

 

 

 

10

w/b Mon 12thNov

 Accuracy, Rules, Text

 

 

 

 

11

w/b Mon 19th Nov

 

 

C/W 3

 

12

w/b Mon 26th Nov

 

 

 

 

 

 

 

OLD THINGS

I am redoing the 2012 DM lectures and courseworks a bit;  below is what it was in 2011, and gradually the correct things will be listed in the table above as soon as they are done.  In the meantime you can refer to below to get some idea of what is coming up.

thing

slides

coursework handouts

otherthings

DWC Lecture 1

 Overview of DM -- Types of data -- examples;

 COURSEWORK A (in the slides)

a paper describing some retail basket data

DWC Lecture 2

 Some basic issues: classification, 1-NN, normalisation, discretization;

COURSEWORK B (in the slides)

 

DWC Lecture 3

 Basic Statistics and Coursework 1

 COURSEWORK 1

Some awk scripts: znorm.awk, cs2ss.awk, fiddlefield.awk, removefields.awk, fixcomdata.awk.  

DWC Lecture 4

 Basket Data/Association Rules (A Priori algorithm)

 

the A priori paper

DWC Lecture 5

Correlation and Coursework 3

COURSEWORK 3

 The naive bayes awk program for use in CW2

A paper concerned with feature selection when you want to find a very small number of features. Recommended read.

DWC Lecture 6

Cluster Analysis and Clustering

DM Lecture 7

Feature selection 

 

 the original relief paper; a paper with many variants of the relief method; the Dash and Liu survey on FS

 

A couple of my papers with PhD students concerning feature selection:

-about Text features

-about Genes

DM Lecture 8 

Similarity and Distance

 

DM Lecture 9

 TBA

 

 

 

 

 

 

  

Assessment

The assessment of this module is entirely by coursework. My aim in setting the coursework is to get you to learn about the essentials of data mining by working on real data and real issues that myself and/or my colleagues are currently working on. You will implement and use techniques that are in fact quite straightforward, but are nevertheless very important, and in fact these techniques are often misused or not used at all (but of course they should be) by data analysts in science and industry.