F78DS - Data Science Life Cycle

Course leader(s):

Alistair Wallis (Edinburgh, January)
Nasreddine Megrez (Dubai, January)
Ian Tan (Malaysia, January)

Aims

This course aims to look at the processes involved in a standard data science life-cycle. It aims to use appropriately sized case studies to understand and explore the different facets of working with data from the source of data to obtaining insights from it. Significant effort will be placed on the coding to handle the data, understanding the characteristics of the data, conducting exploratory data analysis, preparing the data, conducting predictive analysis, operationalising the data, and visualising the data for meaningful insights. The course will also include data storage variations, policies with regards to storage and usage, and legal and ethical requirements, and their challenges.

Syllabus

1. Introduction to Data Science: Definition, usage areas, opportunities

2. The Data Science Lifecycle

3. Python Notebooks (3.1 Installation, configuration and introduction to Python notebooks)

4. Sources of Data

5. Data formats and challenges

6. Introduction to Python, data types, iterations, conditions (6.1 Python programming language for Data Science will be introduced here.)

7. Exploratory Data Analysis

8. Descriptive Statistics

9. Exploratory analysis (9.1 Read data from case studies and conduct exploratory analysis with basic statistics.)

10. Data Wrangling

11. Data Quality

12. Handling and preparing Data (12.1 Explore, understand, and prepare data from various sources through ensuring data quality and preparing it for further analysis.)

13. Tidy Data

14. Data Visualisation

15. Preparing tidy data and visualise (15.1 Prepare the data into suitable computer readable format that can be used for analysis. Visualise the resultant data prior to analysis.)

16. Models, Bias and Variance

17. Training, validation and testing

18. Preparing datasets (18.1 Preparing input data for data mining and machine learning algorithms.)

19. Associative Rule Mining

20. Linear Regression

21. Market-basket analysis & Linear Regression (21.1 Use case studies for implementing associative rule mining and linear regression.)

22. Logistical Regression / Classification

23. Clustering

24. Classification and Clustering (24.1 Use case studies for implementing classification and clustering.)

25. Model deployment, MLOps

26. Delivering via API

27. Building API to deliver models (27.1 Using a web services to deliver an API for external programs.)

28. Data and Model maintenance, MLOps

29. API, Models and Data (29.1 Introductory sample for MLOps)

30. Data Management and Governance

31. Exploratory data analysis report using markdown. (31.1 Coursework 1)

32. Model Building and Validation (32.1 Coursework 2)

33. Exam preparation (33.1 Practice questions)

Learning outcomes

By the end of the course, students should be able to do the following:

understand and explain the roles of data and its governance from a business perspective.
identify and implement the processes required for organising and maintaining data in an organisation.
classify participants and analyse suitable resources and tools for a data science project throughout its lifecycle.
demonstrate, through usage of tools, the techniques to determine the size, quality, storage, scope and classify the data for processing.
conduct various kinds of data analysis and statistical methods for a data science project.

Further details

Curriculum explorer: Click here

SCQF Level: 8

Credits: 15