Super demo for nbdev. We'll write a discretizer.

Normally this bit would describe the package, give Install instructions, and then some Examples.

But since this is an nbdev demo, I'll start by talking about using nbdev. I'll assume by this time you have:

  • Switched to a suitable Python virtual environment
  • Installed jupyter
  • Installed nbdev
  • Found and maybe cloned this repo.

This README is written in index.ipynb which generates README.md and the index.html page in docs/. Yes, you get to write README in Jupyter!

  • GitHub renders README.md as the main package description.
  • It also creates and hosts full package documentation via GutHub Pages.
  • Or you can see those locally if you install & run Jekyll.

In index.ipynb, assuming you've imported your module(s) up top (from mydemo.core import *), you can generate the mydemo Python package & docs via these two commands:

$ nbdev_build_lib && nbdev_build_docs

(If you have make installed, just type make!)

Caveats

⚠️ The first cell should import your module(s). We just have one, named core. It's defined in 00_core.ipynb which generates mydemo/core.py, which we import here as from mydemo.core import *, up in the first cell.

⚠️ The "MyDemo" title cell should be (roughly) second, and has a specific format. Follow the template:

# ModuleName

   > One-line module description

⚠️ The settings.ini can be tricky. Remember to require key packages or it may not work in a new environment (so GitHub's Continuous Integration will fail).

Install

Write your install instructions here. Typically something like:

pip install your_project_name <-- replace with mydemo...

Note: nbdev makes it easy to upload your package to pip or conda, but if doing this for work, check with work first!. Similarly with GitHub etc. (Though you can configure nbdev to use private repositories.)

How to use

Fill me in please! Don't forget code examples.

OK, let's use our module's data-grabbing function to get car crash data.

df = getCrashes()
df.sample(5)
total speeding alcohol not_distracted no_previous ins_premium ins_losses abbrev
39 11.1 3.774 4.218 10.212 8.769 1148.99 148.58 RI
2 18.6 6.510 5.208 15.624 17.856 899.47 110.35 AZ
13 12.8 4.608 4.352 12.032 12.288 803.11 139.15 IL
44 11.3 4.859 1.808 9.944 10.848 809.38 109.48 UT
41 19.4 6.014 6.402 19.012 16.684 669.31 96.87 SD

Test our super demo function

Core defines a few handy functions like is_numeric(). Try it:

df.apply(is_numeric)
total              True
speeding           True
alcohol            True
not_distracted     True
no_previous        True
ins_premium        True
ins_losses         True
abbrev            False
dtype: bool

It's common to put assert in some tests so nbdev can check during build. (This is more common in the modules rather than the index/README.) Here:

assert is_numeric(df['speeding'])

Test the fancy discretize() function.

It will report its actions, and then return the discretized dataframe, suitable for passing on to your Bayes Net learning algorithm, etc.

help(discretize)
Help on function discretize in module mydemo.core:

discretize(df, nbins=10, cut=<function qcut at 0x7ff4a04a6440>, verbose=2, drop_useless=True)
    Discretize columns in {df} to have at most {nbins} categories.
      * Categorical columns: take the Top n-1 plus "Other"
      * Continuous columns: cut into {nbins} using {cut}.
    
    Returns a new discretized dataframe with the same column names.
    Promotes discrete columns to categories.
    
    Parameters
    -----------
    df: Dataframe to discretize
    nbins: Max number of bins to use. May return fewer.
    cut: Cutting method. Default `pd.qcut`. Consider pd.cut, or write your own.
    verbose: 0: silent, 1: colnames, 2: (Default) top N for each column
    drop_useless: Removes columns that have < 2 unique values.
    
    Replaces numerical NA values with 'NA'.

discretize(df, nbins=4)
total:
	(5.899, 12.75]    13
	(12.75, 15.6]     13
	(15.6, 18.5]      12
	(18.5, 23.9]      13
speeding:
	(1.7910000000000001, 3.766]    13
	(3.766, 4.608]                 13
	(4.608, 6.439]                 12
	(6.439, 9.45]                  13
alcohol:
	(1.592, 3.894]     13
	(3.894, 4.554]     13
	(4.554, 5.604]     12
	(5.604, 10.038]    13
not_distracted:
	(1.7590000000000001, 10.478]    13
	(10.478, 13.857]                13
	(13.857, 16.14]                 12
	(16.14, 23.661]                 13
no_previous:
	(5.899, 11.348]     13
	(11.348, 13.775]    13
	(13.775, 16.755]    12
	(16.755, 21.28]     13
ins_premium:
	(641.9590000000001, 768.43]    13
	(768.43, 858.97]               13
	(858.97, 1007.945]             12
	(1007.945, 1301.52]            13
ins_losses:
	(82.749, 114.645]    13
	(114.645, 136.05]    13
	(136.05, 151.87]     12
	(151.87, 194.78]     13
abbrev:
	LA        1
	MI        1
	MO        1
	WY        1
	Other    47
  DROPPED [] because < 2 vals each.
total speeding alcohol not_distracted no_previous ins_premium ins_losses abbrev
0 (18.5, 23.9] (6.439, 9.45] (5.604, 10.038] (16.14, 23.661] (13.775, 16.755] (768.43, 858.97] (136.05, 151.87] Other
1 (15.6, 18.5] (6.439, 9.45] (3.894, 4.554] (16.14, 23.661] (16.755, 21.28] (1007.945, 1301.52] (114.645, 136.05] Other
2 (18.5, 23.9] (6.439, 9.45] (4.554, 5.604] (13.857, 16.14] (16.755, 21.28] (858.97, 1007.945] (82.749, 114.645] Other
3 (18.5, 23.9] (3.766, 4.608] (5.604, 10.038] (16.14, 23.661] (16.755, 21.28] (768.43, 858.97] (136.05, 151.87] Other
4 (5.899, 12.75] (3.766, 4.608] (1.592, 3.894] (10.478, 13.857] (5.899, 11.348] (858.97, 1007.945] (151.87, 194.78] Other
5 (12.75, 15.6] (4.608, 6.439] (1.592, 3.894] (10.478, 13.857] (11.348, 13.775] (768.43, 858.97] (136.05, 151.87] Other
6 (5.899, 12.75] (4.608, 6.439] (1.592, 3.894] (1.7590000000000001, 10.478] (5.899, 11.348] (1007.945, 1301.52] (151.87, 194.78] Other
7 (15.6, 18.5] (4.608, 6.439] (4.554, 5.604] (13.857, 16.14] (13.775, 16.755] (1007.945, 1301.52] (136.05, 151.87] Other
8 (5.899, 12.75] (1.7910000000000001, 3.766] (1.592, 3.894] (1.7590000000000001, 10.478] (5.899, 11.348] (1007.945, 1301.52] (114.645, 136.05] Other
9 (15.6, 18.5] (1.7910000000000001, 3.766] (4.554, 5.604] (16.14, 23.661] (16.755, 21.28] (1007.945, 1301.52] (136.05, 151.87] Other
10 (12.75, 15.6] (1.7910000000000001, 3.766] (3.894, 4.554] (13.857, 16.14] (13.775, 16.755] (858.97, 1007.945] (136.05, 151.87] Other
11 (15.6, 18.5] (6.439, 9.45] (5.604, 10.038] (13.857, 16.14] (13.775, 16.755] (858.97, 1007.945] (114.645, 136.05] Other
12 (12.75, 15.6] (4.608, 6.439] (3.894, 4.554] (10.478, 13.857] (13.775, 16.755] (641.9590000000001, 768.43] (82.749, 114.645] Other
13 (12.75, 15.6] (3.766, 4.608] (3.894, 4.554] (10.478, 13.857] (11.348, 13.775] (768.43, 858.97] (136.05, 151.87] Other
14 (12.75, 15.6] (1.7910000000000001, 3.766] (3.894, 4.554] (10.478, 13.857] (11.348, 13.775] (641.9590000000001, 768.43] (82.749, 114.645] Other
15 (15.6, 18.5] (1.7910000000000001, 3.766] (3.894, 4.554] (13.857, 16.14] (11.348, 13.775] (641.9590000000001, 768.43] (82.749, 114.645] Other
16 (15.6, 18.5] (4.608, 6.439] (3.894, 4.554] (10.478, 13.857] (13.775, 16.755] (768.43, 858.97] (114.645, 136.05] Other
17 (18.5, 23.9] (3.766, 4.608] (4.554, 5.604] (16.14, 23.661] (13.775, 16.755] (858.97, 1007.945] (136.05, 151.87] Other
18 (18.5, 23.9] (6.439, 9.45] (5.604, 10.038] (13.857, 16.14] (16.755, 21.28] (1007.945, 1301.52] (151.87, 194.78] LA
19 (12.75, 15.6] (4.608, 6.439] (3.894, 4.554] (10.478, 13.857] (11.348, 13.775] (641.9590000000001, 768.43] (82.749, 114.645] Other
20 (5.899, 12.75] (3.766, 4.608] (3.894, 4.554] (1.7590000000000001, 10.478] (11.348, 13.775] (1007.945, 1301.52] (151.87, 194.78] Other
21 (5.899, 12.75] (1.7910000000000001, 3.766] (1.592, 3.894] (1.7590000000000001, 10.478] (5.899, 11.348] (1007.945, 1301.52] (114.645, 136.05] Other
22 (12.75, 15.6] (1.7910000000000001, 3.766] (3.894, 4.554] (10.478, 13.857] (5.899, 11.348] (1007.945, 1301.52] (151.87, 194.78] MI
23 (5.899, 12.75] (1.7910000000000001, 3.766] (1.592, 3.894] (1.7590000000000001, 10.478] (5.899, 11.348] (768.43, 858.97] (114.645, 136.05] Other
24 (15.6, 18.5] (1.7910000000000001, 3.766] (4.554, 5.604] (1.7590000000000001, 10.478] (16.755, 21.28] (858.97, 1007.945] (151.87, 194.78] Other
25 (15.6, 18.5] (6.439, 9.45] (4.554, 5.604] (13.857, 16.14] (11.348, 13.775] (768.43, 858.97] (136.05, 151.87] MO
26 (18.5, 23.9] (6.439, 9.45] (5.604, 10.038] (16.14, 23.661] (16.755, 21.28] (768.43, 858.97] (82.749, 114.645] Other
27 (12.75, 15.6] (1.7910000000000001, 3.766] (4.554, 5.604] (10.478, 13.857] (11.348, 13.775] (641.9590000000001, 768.43] (114.645, 136.05] Other
28 (12.75, 15.6] (4.608, 6.439] (4.554, 5.604] (13.857, 16.14] (13.775, 16.755] (1007.945, 1301.52] (136.05, 151.87] Other
29 (5.899, 12.75] (3.766, 4.608] (1.592, 3.894] (1.7590000000000001, 10.478] (5.899, 11.348] (641.9590000000001, 768.43] (114.645, 136.05] Other
30 (5.899, 12.75] (1.7910000000000001, 3.766] (1.592, 3.894] (1.7590000000000001, 10.478] (5.899, 11.348] (1007.945, 1301.52] (151.87, 194.78] Other
31 (15.6, 18.5] (1.7910000000000001, 3.766] (4.554, 5.604] (10.478, 13.857] (16.755, 21.28] (858.97, 1007.945] (114.645, 136.05] Other
32 (5.899, 12.75] (3.766, 4.608] (1.592, 3.894] (10.478, 13.857] (5.899, 11.348] (1007.945, 1301.52] (136.05, 151.87] Other
33 (15.6, 18.5] (6.439, 9.45] (4.554, 5.604] (13.857, 16.14] (11.348, 13.775] (641.9590000000001, 768.43] (114.645, 136.05] Other
34 (18.5, 23.9] (4.608, 6.439] (5.604, 10.038] (16.14, 23.661] (16.755, 21.28] (641.9590000000001, 768.43] (82.749, 114.645] Other
35 (12.75, 15.6] (3.766, 4.608] (4.554, 5.604] (13.857, 16.14] (11.348, 13.775] (641.9590000000001, 768.43] (114.645, 136.05] Other
36 (18.5, 23.9] (4.608, 6.439] (5.604, 10.038] (16.14, 23.661] (16.755, 21.28] (858.97, 1007.945] (151.87, 194.78] Other
37 (12.75, 15.6] (3.766, 4.608] (1.592, 3.894] (1.7590000000000001, 10.478] (11.348, 13.775] (768.43, 858.97] (82.749, 114.645] Other
38 (15.6, 18.5] (6.439, 9.45] (5.604, 10.038] (16.14, 23.661] (13.775, 16.755] (858.97, 1007.945] (151.87, 194.78] Other
39 (5.899, 12.75] (3.766, 4.608] (3.894, 4.554] (1.7590000000000001, 10.478] (5.899, 11.348] (1007.945, 1301.52] (136.05, 151.87] Other
40 (18.5, 23.9] (6.439, 9.45] (5.604, 10.038] (16.14, 23.661] (16.755, 21.28] (768.43, 858.97] (114.645, 136.05] Other
41 (18.5, 23.9] (4.608, 6.439] (5.604, 10.038] (16.14, 23.661] (13.775, 16.755] (641.9590000000001, 768.43] (82.749, 114.645] Other
42 (18.5, 23.9] (3.766, 4.608] (5.604, 10.038] (13.857, 16.14] (13.775, 16.755] (641.9590000000001, 768.43] (151.87, 194.78] Other
43 (18.5, 23.9] (6.439, 9.45] (5.604, 10.038] (16.14, 23.661] (16.755, 21.28] (858.97, 1007.945] (151.87, 194.78] Other
44 (5.899, 12.75] (4.608, 6.439] (1.592, 3.894] (1.7590000000000001, 10.478] (5.899, 11.348] (768.43, 858.97] (82.749, 114.645] Other
45 (12.75, 15.6] (3.766, 4.608] (3.894, 4.554] (10.478, 13.857] (11.348, 13.775] (641.9590000000001, 768.43] (82.749, 114.645] Other
46 (5.899, 12.75] (1.7910000000000001, 3.766] (1.592, 3.894] (10.478, 13.857] (5.899, 11.348] (768.43, 858.97] (151.87, 194.78] Other
47 (5.899, 12.75] (3.766, 4.608] (1.592, 3.894] (1.7590000000000001, 10.478] (5.899, 11.348] (858.97, 1007.945] (82.749, 114.645] Other
48 (18.5, 23.9] (6.439, 9.45] (5.604, 10.038] (16.14, 23.661] (16.755, 21.28] (858.97, 1007.945] (151.87, 194.78] Other
49 (12.75, 15.6] (4.608, 6.439] (3.894, 4.554] (1.7590000000000001, 10.478] (11.348, 13.775] (641.9590000000001, 768.43] (82.749, 114.645] Other
50 (15.6, 18.5] (6.439, 9.45] (4.554, 5.604] (13.857, 16.14] (13.775, 16.755] (768.43, 858.97] (114.645, 136.05] WY