This course aims to look at the processes involved in a standard data science life-cycle. It aims to use appropriately sized case studies to understand and explore the different facets of working with data from the source of data to obtaining insights from it. Significant effort will be placed on the coding to handle the data, understanding the characteristics of the data, conducting exploratory data analysis, preparing the data, conducting predictive analysis, operationalising the data, and visualising the data for meaningful insights. The course will also include data storage variations, policies with regards to storage and usage, and legal and ethical requirements, and their challenges.
1. Introduction to Data Science: Definition, usage areas, opportunities
2. The Data Science Lifecycle
3. Python Notebooks (3.1 Installation, configuration and introduction to Python notebooks)
4. Sources of Data
5. Data formats and challenges
6. Introduction to Python, data types, iterations, conditions (6.1 Python programming language for Data Science will be introduced here.)
7. Exploratory Data Analysis
8. Descriptive Statistics
9. Exploratory analysis (9.1 Read data from case studies and conduct exploratory analysis with basic statistics.)
10. Data Wrangling
11. Data Quality
12. Handling and preparing Data (12.1 Explore, understand, and prepare data from various sources through ensuring data quality and preparing it for further analysis.)
13. Tidy Data
14. Data Visualisation
15. Preparing tidy data and visualise (15.1 Prepare the data into suitable computer readable format that can be used for analysis. Visualise the resultant data prior to analysis.)
16. Models, Bias and Variance
17. Training, validation and testing
18. Preparing datasets (18.1 Preparing input data for data mining and machine learning algorithms.)
19. Associative Rule Mining
20. Linear Regression
21. Market-basket analysis & Linear Regression (21.1 Use case studies for implementing associative rule mining and linear regression.)
22. Logistical Regression / Classification
23. Clustering
24. Classification and Clustering (24.1 Use case studies for implementing classification and clustering.)
25. Model deployment, MLOps
26. Delivering via API
27. Building API to deliver models (27.1 Using a web services to deliver an API for external programs.)
28. Data and Model maintenance, MLOps
29. API, Models and Data (29.1 Introductory sample for MLOps)
30. Data Management and Governance
31. Exploratory data analysis report using markdown. (31.1 Coursework 1)
32. Model Building and Validation (32.1 Coursework 2)
33. Exam preparation (33.1 Practice questions)
By the end of the course, students should be able to do the following:
Curriculum explorer: Click here
SCQF Level: 8
Credits: 15