Free on Google Colab — Time series forecast with Prophet

Open in Colab: https://colab.research.google.com/github/gumdropsteve/intro_to_prophet/blob/master/prophet_yfinance.ipynb
Open in Colab: https://colab.research.google.com/github/gumdropsteve/intro_to_prophet/blob/master/prophet_yfinance.ipynb

Install yfinance

With the Yahoo! Finance market data downloader (yfinance), we can pull historical data on virtually any stock with a single line of code.

You can install yfinance with pip;

From there, simply import the library, and pull a .Ticker().

Pull & Prep Data

Let’s do the New York Times Company;

That outputs a yfinance.Ticker object, which holds historical $NYT data accessible with .history(). Setting period=’max’ will return all data;

Prophet expects input data to have 2 columns, ds and y, so let’s just copy the historical dates (hist.index) and adjusted closing prices (hist[‘Close’]) to a new DataFrame.

Forecast with Prophet


Getting started with Facebook Prophet’s R API (real data + code)

Image for post
Image for post

“Time series forecasting is the use of a model to predict future values based on previously observed values.”

— Wikipeida

In this story, we’ll break down and examine the R API of Prophet, a procedure for forecasting time series data open-sourced by Facebook in February 2017 with v0.6 released in March 2020.

Outline

  1. What is Facebook Prophet ?
  2. How does Prophet work?
  3. How does Prophet work
  4. Practice with Prophet
  • 4.1 Installation & Imports
  • 4.2 Data & Prep
  • 4.3 Making a Forecast
  • 4.4 Breaking Down a Forecast
  • 4.5 Forecast Quality Evaluation

What is Facebook Prophet?

While advancements in data science often increase the infamous “skills…


No camera required. (Built on Jetson Nano.)

Image for post
Image for post
Code to Reproduce this Display (original video source)

cv.cuda

OpenCV’s CUDA python module is a lot of fun, but it’s a work in progress.

For starters, we have to load in the video on CPU before passing it (frame-by-frame) to GPU. cv.cuda.imread() has not been built yet.

Step 1 — .upload()

cv.VideoCapture() can be used to load and iterate through video frames on CPU. Let’s read the corn.mp4 file with it;

After .read()ing the 1st image, we’re ready to make a GPU matrix (picture frame) so that image can be .upload()ed to our GPU.

Great! But what about the 2nd image?

Well, you probably noticed .read() output 2 variables, ret


From single image to Dask Delayed (Python)

Image for post
Image for post
Looks like we’re stuck in RGB.

Outline

  • On a Single Image
  • On a Series of Images
  • On Series of Images in Parallel with Dask Delayed

On a Single Image

First, we need to create GPU space (gpu_frame) to hold an image (as a picture frame holds a picture) before we can upload our image to the GPU.

Step 1: Upload

Next, load the image into memory with CPU (screenshot), and .upload() it to the gpu_frame (frame the image);

Image now in frame, we can start having fun.

Step 2: Have Fun

OpenCV CUDA functions return cv2.cuda_GpuMat (GPU matrices), so each result can be operated on without the user having to re-.upload().

Let’s convert the image…


3 Step Set up and get started (+ test code)

Step 1 — Install PyCUDA

Install PyCUDA with PIP;

If you don’t have pip, get pip.

Step 2 — Set nvcc Path

Nvcc comes preinstalled, but your Nano isn’t exactly told about it.. Use sudo to open your bashrc file;

Add a blank, then these 2 lines (letting your Nano know where CUDA is) to the bottom of the file;

Save, close, then (back in Terminal) source the bashrc file

You can now check your nvcc version with;

Step 3 — Test with Code

By using PyCUDA’s SourceModule to create a function (add_them) with CUDA C code, we can simply .get_function()


(Python) Set up Chromium Chromedriver and get started. (+ sample code)

Image for post
Image for post

Step 1 — Install Selenium with pip

or

Assuming you have pip on your system, you can install or upgrade Selenium’s Python bindings with one of the above (the latter worked for me). If you don’t have pip, get pip.

Step 2 — Install Chromium Webdriver

To interface with a browser (Chromium in our case), Selenium requires a driver (Chromium Chromedriver in our case).

Step 3 — Simple Test

Paste the following into your favorite editor or python terminal, and if it runs you’re good to go!

Fin

Thanks for reading! Please feel free to respond with any questions.

Continued Reading


Parallelizing Time Series Cross-Validation and Hyperparameter Optimization with Dask (Applied Example w/ Code)

Cross validating Prophet with Dask is done the same as cross validating Prophet without Dask, but you pass parallel=’dask’ into the cross_validation() function like;

In this story, we’ll use Prophet to forecast the average distance of a NYC yellow cab trip by day. To quickly judge our model’s performance, we’ll call on Dask to parallelize cross-validation across your system’s CPUs.

Afterwords, we’ll apply this parallelized cross_validation() to perform hyperparameter optimization (HPO) and fine tune that model.

Outline

  • The Dataset
  • Basic Model (Running a Default Prophet)
  • Parallelizing Cross-Validation with Dask
  • Hyperparameter Optimization with Dask (Applying Cross-Validation)

The Dataset

Our data runs from January…


What is Logistic Regression? And how to implement it in Python with RAPIDS cuML

Logistic regression is an algorithm used to predict the probability of events, given some other measurements. Logistic Regression is used when the dependent variable (“target”) is categorical.

For example,

  • Will the team win (1) or lose (0) this game?
  • Are users going to stop using our app (1) or not (0)?

Logistic regression can also be used in non-binary situations, but let’s cover that in a later post and stick to binary logistic regression for now.

How does Logistic Regression work?

Essentially, the logistic regression function takes examples with known classes (e.g. cake (1) or pie (0)), fits a (Sigmoid) line to their distribution, and…


What is K-Means? And how to implement it in Python with RAPIDS cuML

K-Means is an easy way to cluster data. It randomly selects K points in a given dataset, then computes which of the dataset’s instances are closest to each point (making clusters).

Image for post
Image for post
Source

For every cluster, the mean of its values (instances) is computed, and this mean becomes that cluster’s new point (centroid).

Once a cluster’s centroid has moved, its distance from the dataset’s instances has changed, and instances may be added to or removed from that cluster. The mean is recalculated & replaced until it stops moving or has hit a given maximum iterations (max_iter) whichever comes first.


What is K-Nearest Neighbors? And how to implement it in Python with RAPIDS cuML

K-Nearest Neighbors (KNN) is a simple way to determine the value of something by asking what the values of the K nearest things to it are.

So if K=3, what do the 3 nearest things look like?

Image for post
Image for post
KNN Classification, K=3 | 1 blue < 2 red | new point == red

How does the KNN algorithm work?

KNN is a non-parametric, lazy learning algorithm.

Non-parametric

Non-parametric models differ from parametric models in that the model structure is not specified a priori [(beforehand)] but is instead determined from data.

Basically, KNN makes no assumption on the data’s underlying distribution, and builds itself by analyzing the data. The one set “parameter” is K.

The term non-parametric is not meant to imply that such…

Winston Robson

Friend links: https://gumdropsteve.github.io/blog — “Energy may be likened to the bending of a crossbow, decision to the releasing of a trigger.”

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store