RFC – Data Science Essentials Course

Intro

Hey all, thanks for checking this out.

This is an RFC (request for comments) where I am seeking out any feedback on my idea.

The purpose is to adapt to feedback as quickly as possible and fail fast as needed.

It plays into 2 key beliefs I have seen in successful entrepreneurs:

Build in public
Seek feedback as quickly as possible

Where to give feedback

I would ideally like feedback to focus on the following, but would also welcome any feedback you feel is relevant.

Indicative content – Have I missed anything? What would you like to see here?
Overall, what is your feel for the idea? What have I missed?

Please reach out to me to give feedback on any of the channels below. I have also set up a Discord server (no one is on there right now – so you could be the first!)

LinkedIn – https://www.linkedin.com/in/johnstamford/

Messenger – https://m.me/john.stamford

Discord – https://discord.gg/EKdeRsw4

Overview

The aim of my first business idea is to launch a self-paced online Data Science course.

The course aims to:

Equip students with the essential skills and knowledge required to start a career in data science
Introduce, demonstrate and give opportunities to develop code, and solve data-related problems using statistics, regression and machine learning.
Remove barriers by focusing on the core skills and competencies required (the most direct route – no-bloat)
Be online, and self-paced.
Open to both technical and semi-technical individuals

It is aimed at a broad target audience, but it can be broken down into two specific themes:

Semi-technical people who wish to learn new skills and start to consider transitioning across to data science roles (see background for examples)
Technical people who want to upskill quickly to be able to apply data science and machine learning

The entry requirements for the course are:

Have access to a Mac/PC and be able to install software
Understand what a variable is
Have some (even limited) experience in coding (any language)

Background

I have worked and led projects in data science and machine learning for over 10 years, and I’ve learned there is typically a common set of skills that will cover 90%+ of the projects you’ll work on.

I also spent several years teaching computer science at college and university, having taught 1,000s of students and supervised a broad range of data analytics, data science and machine learning postgraduate students and projects.

My most relevant experience to this was teaching a postgraduate module called “Statistical Programming in R”. The course was taken by postgraduate students mainly from non-computer science courses and backgrounds (biology, sociology, etc). The course introduced them to R, and the environment and over a 3-day course ran them through a bunch of examples demonstrating how to apply statistical techniques in R.

I plan to create a similar course with 3 key differences:

It will be done in Python, not R
It will extend beyond statistics and include machine learning.
It will be online and self-paced

Competition and the market

This section is not within the scope of this RFC and I still need to plan my go-to-market strategy.

However, I acknowledge – there is a lot of competition out there. This indicates 2 key things, there are a lot of other competing courses but also that there is large demand for DS courses.

My Unique Selling Point

The course is built around my experience with the aim of giving the students the most direct and relevant skills. This is based on my experience which includes:

Completed an MSc in Machine Learning
PhD in Data Science
10 years experience, including 4 yrs in big tech
Former Director of Machine Learning
Former senior machine learning engineer at Meta
Former Data Science manager at Meta
Advised multiple startups on data science and machine learning
Experienced and qualified teacher/lecturer

Indicative Content

[subject to change and refinement]

Section	Title	By the end of the session
Intro	Aims and Objectives	Understand what the purpose of the course is and what you will learn
Intro	About Me	Have faith that this is taught by a suitably qualified person
Intro	Structure	Understand the structure of the course
Intro	Project Files	Be able to find the data and files needed
Intro	Free updates
Environment	Mac / Windows / Linux	Understand how business use it (Mac seems to be a preference, but not essential)
Environment	Installing Anaconda Python
Environment	PIP	How to install packages using PIP
Environment	Sample File / Folder Structure
Environment	Intro to notebooks
Pandas & Numpy	Pandas and Numpy Intro	Understand what Pandas and Numpy Is is
Loading Data	CSV Demo and explore	See how we load in data
Loading Data	MySQL Discussion	Be aware they need to develop their skills and have seen different ways to import data
Pandas & Numpy	Pandas Basics	Understand top (n) functions – TODO: Plan
Pandas & Numpy	Slicing, extracting and copying	Understand how to do basic manipulation and filtering
Pandas & Numpy	Lambda and Apply
Pandas & Numpy	Apply some basic Numpy functions
Plotting and visualisation	Intro to Matplotlib	Understand the main plotting package
Plotting and visualisation	Basic Plots	Line, Scatter, Box
Plotting and visualisation	Make plots nicer	Labels, Legends, Subplots
Plotting and visualisation	Seaborn, Plotly	Know other plots
Basic Statistics	Types of data	Understand continuous, ordinal and nominal
Basic Statistics	Mean, Median, Std, IQR
Basic Statistics	Compare between groups
Basic Statistics	Normal Distribution	Understand Normal, use Log transforms, KDE Plots, QQ Plots
Basic Statistics	Normal Distribution Stats Tests	Be able to apply a test to see if it is normal or not
Basic Statistics	Bonus – kurtosis and skewness	Be able to calculate, with demos of what that means
Basic Statistics	Compare distributions	Be able to check for statistical differences, Normal, Skewed and Count
Regression	Intro to Linear Regression	SciPy and SKLearn, statsmodel
Regression	Types of data to use	Understand it works best with normalised data, not ordinal or nominal
Regression	Linear Regression Demo	Be able to build our own LR model
Regression	Discuss the results from Linear Regression	Be able to understand and interpret the results
Regression	Make the model better	Manually applying Stepwise
Regression	Limitations	Understand confidence
Regression	Challenge	To apply LR independently to your own problem
Regression 2	Intro to Logistic Regression	Understand the difference between linear and logistic regression
Regression 2	Prepare Data
Regression 2	Apply Logistic Regression to Data
Regression 2	interpret the results
Regression 2	Make it better
Machine Learning	Intro to a range of different techniques	SK Learn
Machine Learning	Intro to non-linear problems	BMI
Machine Learning	Intro to NNs	Tensorflow playground
Machine Learning	A basic NN	In SK Learn
Machine Learning	Intro to pytorch	Just discuss, GPU, etc
Machine Learning	Split data into test and training	And rebuild NN, with testing
Machine Learning	How to evaluate	Confusion Matrix
Machine Learning	Play with the topology	See if we can improve the model
Machine Learning	Try other techniques	I’ll demo one, but you try others
Machine Learning 2	Handling Text	Demo on how to convert data
Machine Learning 2	PCA Visualisation
Machine Learning 2	Decision Tree
Machine Learning 2	Random Forrest
Finalisation	Summary
Finalisation	Where to go next

Pricing

The estimated price will be between $150 and $500, and will be decided closer the the launch.

This will include access to future updates and extensions to the course.