Intro
Hey all, thanks for checking this out.
This is an RFC (request for comments) where I am seeking out any feedback on my idea.
The purpose is to adapt to feedback as quickly as possible and fail fast as needed.
It plays into 2 key beliefs I have seen in successful entrepreneurs:
- Build in public
- Seek feedback as quickly as possible
Where to give feedback
I would ideally like feedback to focus on the following, but would also welcome any feedback you feel is relevant.
- Indicative content – Have I missed anything? What would you like to see here?
- Overall, what is your feel for the idea? What have I missed?
Please reach out to me to give feedback on any of the channels below. I have also set up a Discord server (no one is on there right now – so you could be the first!)
LinkedIn – https://www.linkedin.com/in/johnstamford/
Messenger – https://m.me/john.stamford
Discord – https://discord.gg/EKdeRsw4
Overview
The aim of my first business idea is to launch a self-paced online Data Science course.
The course aims to:
- Equip students with the essential skills and knowledge required to start a career in data science
- Introduce, demonstrate and give opportunities to develop code, and solve data-related problems using statistics, regression and machine learning.
- Remove barriers by focusing on the core skills and competencies required (the most direct route – no-bloat)
- Be online, and self-paced.
- Open to both technical and semi-technical individuals
It is aimed at a broad target audience, but it can be broken down into two specific themes:
- Semi-technical people who wish to learn new skills and start to consider transitioning across to data science roles (see background for examples)
- Technical people who want to upskill quickly to be able to apply data science and machine learning
The entry requirements for the course are:
- Have access to a Mac/PC and be able to install software
- Understand what a variable is
- Have some (even limited) experience in coding (any language)
Background
I have worked and led projects in data science and machine learning for over 10 years, and I’ve learned there is typically a common set of skills that will cover 90%+ of the projects you’ll work on.
I also spent several years teaching computer science at college and university, having taught 1,000s of students and supervised a broad range of data analytics, data science and machine learning postgraduate students and projects.
My most relevant experience to this was teaching a postgraduate module called “Statistical Programming in R”. The course was taken by postgraduate students mainly from non-computer science courses and backgrounds (biology, sociology, etc). The course introduced them to R, and the environment and over a 3-day course ran them through a bunch of examples demonstrating how to apply statistical techniques in R.
I plan to create a similar course with 3 key differences:
- It will be done in Python, not R
- It will extend beyond statistics and include machine learning.
- It will be online and self-paced
Competition and the market
This section is not within the scope of this RFC and I still need to plan my go-to-market strategy.
However, I acknowledge – there is a lot of competition out there. This indicates 2 key things, there are a lot of other competing courses but also that there is large demand for DS courses.
My Unique Selling Point
The course is built around my experience with the aim of giving the students the most direct and relevant skills. This is based on my experience which includes:
- Completed an MSc in Machine Learning
- PhD in Data Science
- 10 years experience, including 4 yrs in big tech
- Former Director of Machine Learning
- Former senior machine learning engineer at Meta
- Former Data Science manager at Meta
- Advised multiple startups on data science and machine learning
- Experienced and qualified teacher/lecturer
Indicative Content
[subject to change and refinement]
Section | Title | By the end of the session |
Intro | Aims and Objectives | Understand what the purpose of the course is and what you will learn |
Intro | About Me | Have faith that this is taught by a suitably qualified person |
Intro | Structure | Understand the structure of the course |
Intro | Project Files | Be able to find the data and files needed |
Intro | Free updates | |
Environment | Mac / Windows / Linux | Understand how business use it (Mac seems to be a preference, but not essential) |
Environment | Installing Anaconda Python | |
Environment | PIP | How to install packages using PIP |
Environment | Sample File / Folder Structure | |
Environment | Intro to notebooks | |
Pandas & Numpy | Pandas and Numpy Intro | Understand what Pandas and Numpy Is is |
Loading Data | CSV Demo and explore | See how we load in data |
Loading Data | MySQL Discussion | Be aware they need to develop their skills and have seen different ways to import data |
Pandas & Numpy | Pandas Basics | Understand top (n) functions – TODO: Plan |
Pandas & Numpy | Slicing, extracting and copying | Understand how to do basic manipulation and filtering |
Pandas & Numpy | Lambda and Apply | |
Pandas & Numpy | Apply some basic Numpy functions | |
Plotting and visualisation | Intro to Matplotlib | Understand the main plotting package |
Plotting and visualisation | Basic Plots | Line, Scatter, Box |
Plotting and visualisation | Make plots nicer | Labels, Legends, Subplots |
Plotting and visualisation | Seaborn, Plotly | Know other plots |
Basic Statistics | Types of data | Understand continuous, ordinal and nominal |
Basic Statistics | Mean, Median, Std, IQR | |
Basic Statistics | Compare between groups | |
Basic Statistics | Normal Distribution | Understand Normal, use Log transforms, KDE Plots, QQ Plots |
Basic Statistics | Normal Distribution Stats Tests | Be able to apply a test to see if it is normal or not |
Basic Statistics | Bonus – kurtosis and skewness | Be able to calculate, with demos of what that means |
Basic Statistics | Compare distributions | Be able to check for statistical differences, Normal, Skewed and Count |
Regression | Intro to Linear Regression | SciPy and SKLearn, statsmodel |
Regression | Types of data to use | Understand it works best with normalised data, not ordinal or nominal |
Regression | Linear Regression Demo | Be able to build our own LR model |
Regression | Discuss the results from Linear Regression | Be able to understand and interpret the results |
Regression | Make the model better | Manually applying Stepwise |
Regression | Limitations | Understand confidence |
Regression | Challenge | To apply LR independently to your own problem |
Regression 2 | Intro to Logistic Regression | Understand the difference between linear and logistic regression |
Regression 2 | Prepare Data | |
Regression 2 | Apply Logistic Regression to Data | |
Regression 2 | interpret the results | |
Regression 2 | Make it better | |
Machine Learning | Intro to a range of different techniques | SK Learn |
Machine Learning | Intro to non-linear problems | BMI |
Machine Learning | Intro to NNs | Tensorflow playground |
Machine Learning | A basic NN | In SK Learn |
Machine Learning | Intro to pytorch | Just discuss, GPU, etc |
Machine Learning | Split data into test and training | And rebuild NN, with testing |
Machine Learning | How to evaluate | Confusion Matrix |
Machine Learning | Play with the topology | See if we can improve the model |
Machine Learning | Try other techniques | I’ll demo one, but you try others |
Machine Learning 2 | Handling Text | Demo on how to convert data |
Machine Learning 2 | PCA Visualisation | |
Machine Learning 2 | Decision Tree | |
Machine Learning 2 | Random Forrest | |
Finalisation | Summary | |
Finalisation | Where to go next |
Pricing
The estimated price will be between $150 and $500, and will be decided closer the the launch.
This will include access to future updates and extensions to the course.