Intro

Hey all, thanks for checking this out.

This is an RFC (request for comments) where I am seeking out any feedback on my idea.

The purpose is to adapt to feedback as quickly as possible and fail fast as needed.

It plays into 2 key beliefs I have seen in successful entrepreneurs:

  • Build in public
  • Seek feedback as quickly as possible

Where to give feedback

I would ideally like feedback to focus on the following, but would also welcome any feedback you feel is relevant.

  • Indicative content – Have I missed anything? What would you like to see here?
  • Overall, what is your feel for the idea? What have I missed?

Please reach out to me to give feedback on any of the channels below. I have also set up a Discord server (no one is on there right now – so you could be the first!)

LinkedIn – https://www.linkedin.com/in/johnstamford/

Messenger – https://m.me/john.stamford

Discord – https://discord.gg/EKdeRsw4

Overview

The aim of my first business idea is to launch a self-paced online Data Science course.

The course aims to:

  • Equip students with the essential skills and knowledge required to start a career in data science
  • Introduce, demonstrate and give opportunities to develop code, and solve data-related problems using statistics, regression and machine learning.
  • Remove barriers by focusing on the core skills and competencies required (the most direct route – no-bloat)
  • Be online, and self-paced.
  • Open to both technical and semi-technical individuals

It is aimed at a broad target audience, but it can be broken down into two specific themes:

  • Semi-technical people who wish to learn new skills and start to consider transitioning across to data science roles (see background for examples)
  • Technical people who want to upskill quickly to be able to apply data science and machine learning

The entry requirements for the course are:

  • Have access to a Mac/PC and be able to install software
  • Understand what a variable is
  • Have some (even limited) experience in coding (any language)

Background

I have worked and led projects in data science and machine learning for over 10 years, and I’ve learned there is typically a common set of skills that will cover 90%+ of the projects you’ll work on.

I also spent several years teaching computer science at college and university, having taught 1,000s of students and supervised a broad range of data analytics, data science and machine learning postgraduate students and projects.

My most relevant experience to this was teaching a postgraduate module called “Statistical Programming in R”. The course was taken by postgraduate students mainly from non-computer science courses and backgrounds (biology, sociology, etc). The course introduced them to R, and the environment and over a 3-day course ran them through a bunch of examples demonstrating how to apply statistical techniques in R.

I plan to create a similar course with 3 key differences:

  • It will be done in Python, not R
  • It will extend beyond statistics and include machine learning.
  • It will be online and self-paced

Competition and the market

This section is not within the scope of this RFC and I still need to plan my go-to-market strategy.

However, I acknowledge – there is a lot of competition out there. This indicates 2 key things, there are a lot of other competing courses but also that there is large demand for DS courses.

My Unique Selling Point

The course is built around my experience with the aim of giving the students the most direct and relevant skills. This is based on my experience which includes:

  • Completed an MSc in Machine Learning
  • PhD in Data Science
  • 10 years experience, including 4 yrs in big tech
  • Former Director of Machine Learning
  • Former senior machine learning engineer at Meta
  • Former Data Science manager at Meta
  • Advised multiple startups on data science and machine learning
  • Experienced and qualified teacher/lecturer

Indicative Content

[subject to change and refinement]

SectionTitleBy the end of the session
IntroAims and ObjectivesUnderstand what the purpose of the course is and what you will learn
IntroAbout MeHave faith that this is taught by a suitably qualified person
IntroStructureUnderstand the structure of the course
IntroProject FilesBe able to find the data and files needed
IntroFree updates
EnvironmentMac / Windows / LinuxUnderstand how business use it (Mac seems to be a preference, but not essential)
EnvironmentInstalling Anaconda Python
EnvironmentPIPHow to install packages using PIP
EnvironmentSample File / Folder Structure
EnvironmentIntro to notebooks
Pandas & NumpyPandas and Numpy IntroUnderstand what Pandas and Numpy Is is
Loading DataCSV Demo and exploreSee how we load in data
Loading DataMySQL DiscussionBe aware they need to develop their skills and have seen different ways to import data
Pandas & NumpyPandas BasicsUnderstand top (n) functions – TODO: Plan
Pandas & NumpySlicing, extracting and copyingUnderstand how to do basic manipulation and filtering
Pandas & NumpyLambda and Apply
Pandas & NumpyApply some basic Numpy functions
Plotting and visualisationIntro to MatplotlibUnderstand the main plotting package
Plotting and visualisationBasic PlotsLine, Scatter, Box
Plotting and visualisationMake plots nicerLabels, Legends, Subplots
Plotting and visualisationSeaborn, PlotlyKnow other plots
Basic StatisticsTypes of dataUnderstand continuous, ordinal and nominal
Basic StatisticsMean, Median, Std, IQR
Basic StatisticsCompare between groups
Basic StatisticsNormal DistributionUnderstand Normal, use Log transforms, KDE Plots, QQ Plots
Basic StatisticsNormal Distribution Stats TestsBe able to apply a test to see if it is normal or not
Basic StatisticsBonus – kurtosis and skewnessBe able to calculate, with demos of what that means
Basic StatisticsCompare distributionsBe able to check for statistical differences, Normal, Skewed and Count
RegressionIntro to Linear RegressionSciPy and SKLearn, statsmodel
RegressionTypes of data to useUnderstand it works best with normalised data, not ordinal or nominal
RegressionLinear Regression DemoBe able to build our own LR model
RegressionDiscuss the results from Linear RegressionBe able to understand and interpret the results
RegressionMake the model betterManually applying Stepwise
RegressionLimitationsUnderstand confidence
RegressionChallengeTo apply LR independently to your own problem
Regression 2Intro to Logistic RegressionUnderstand the difference between linear and logistic regression
Regression 2Prepare Data
Regression 2Apply Logistic Regression to Data
Regression 2interpret the results
Regression 2Make it better
Machine LearningIntro to a range of different techniquesSK Learn
Machine LearningIntro to non-linear problemsBMI
Machine LearningIntro to NNsTensorflow playground
Machine LearningA basic NNIn SK Learn
Machine LearningIntro to pytorchJust discuss, GPU, etc
Machine LearningSplit data into test and trainingAnd rebuild NN, with testing
Machine LearningHow to evaluateConfusion Matrix
Machine LearningPlay with the topologySee if we can improve the model
Machine LearningTry other techniquesI’ll demo one, but you try others
Machine Learning 2Handling TextDemo on how to convert data
Machine Learning 2PCA Visualisation
Machine Learning 2Decision Tree
Machine Learning 2Random Forrest
FinalisationSummary
FinalisationWhere to go next

Pricing

The estimated price will be between $150 and $500, and will be decided closer the the launch.

This will include access to future updates and extensions to the course.