PandaPost 0.2.0

This Release
PandaPost 0.2.0
Date
Status
Unstable
Abstract
Python NumPy ndarray data type for Postgres
Released By
decibel
License
FreeBSD
Resources
Special Files
Tags

Extensions

PandaPost 0.2.0
Python NumPy ndarray data type for Postgres

README

PandaPost

These extensions allow you to represent Python NumPy/Pandas objects in Postgres.

Warning
This is very much a work in progress. Expect the APIs and what is in each extension to change!

Currently you can think of PandaPost more as an experiment than anything else. It seems like it would be useful to be able to natively store ndarrays, DataFrames, etc in a Postgres column. Then again, maybe not. :)

If you can make use of any of this, please drop a line to our Google Group. We’d love any feedback!

Installation

I haven’t released this on PGXN yet, so for now you need to clone the git repo and make install. After that you can CREATE EXTENSION "PandaPost"; in your database. Currently it installs everything in your default schema; I suggest creating a panda schema and installing there. Note that it also requires plpythonu.

To wrap all that together:

git clone https://github.com/PandaPost/PandaPost.git
# Or, download https://github.com/PandaPost/PandaPost/archive/master.zip and unzip
cd PandaPost
make install
psql <database name>
CREATE EXTENSION plpythonu;
CREATE SCHEMA panda;
CREATE EXTENSION "PandaPost" WITH SCHEMA panda;

Current Status

There are now casts between ndarray and all the plpython supported types (boolean, int*, float, real, numeric and text). This allows things like SELECT panda.str(array[1,1,2,2]::panda.ndarray);.

Table 1. Table General Functions

Function

Description

T

Apply ndarray T() function to input

eval

Returns the results of running python’s eval() on the input, as an ndarray. IE: panda.repr(panda.eval('range(4)'))

repr

Returns the python repr() of the input.

str

Returns the python str() of the input.

Table 2. Table Set Functions

Function

Description

ediff1d

Returns difference between elements in array

in1d

Returns boolean array of whether each element in i is present in ar2. See numpy.in1d.

intersect1d

Return unique list of values that are present in both i and ar2, similar to INTERSECT in SQL. See numpy.intersect1d.

ndunique

Return unique elements in ar, plus other potential output. See numpy.unique.

ndunique1

Same as ndunique() but only return the array of unique elements.

setdiff1d

Return the difference between i and ar2, similar to EXCEPT in SQL. See numpy.setdiff1d.

setxor1d

Return the values that are in only one (but not both) i and ar2. See numpy.setxor1d.

union1d

Return the unique values that are in either i or ar2. See numpy.union1d.

Table 3. Table Special Functions

Function

Description

create_cast

Creates casts to and from a Postgres data type and ndarray

Next up: porting the more common ndarray functions:

np.ndarray.byteswap      np.ndarray.cumsum        np.ndarray.flat          np.ndarray.min           np.ndarray.ravel         np.ndarray.shape         np.ndarray.tobytes
np.ndarray.all           np.ndarray.choose        np.ndarray.data          np.ndarray.flatten       np.ndarray.nbytes        np.ndarray.real          np.ndarray.size          np.ndarray.tofile
np.ndarray.any           np.ndarray.clip          np.ndarray.diagonal      np.ndarray.getfield      np.ndarray.ndim          np.ndarray.repeat        np.ndarray.sort          np.ndarray.tolist
np.ndarray.argmax        np.ndarray.compress      np.ndarray.dot           np.ndarray.imag          np.ndarray.newbyteorder  np.ndarray.reshape       np.ndarray.squeeze       np.ndarray.tostring
np.ndarray.argmin        np.ndarray.conj          np.ndarray.dtype         np.ndarray.item          np.ndarray.nonzero       np.ndarray.resize        np.ndarray.std           np.ndarray.trace
np.ndarray.argpartition  np.ndarray.conjugate     np.ndarray.dump          np.ndarray.itemset       np.ndarray.partition     np.ndarray.round         np.ndarray.strides       np.ndarray.transpose
np.ndarray.argsort       np.ndarray.copy          np.ndarray.dumps         np.ndarray.itemsize      np.ndarray.prod          np.ndarray.searchsorted  np.ndarray.sum           np.ndarray.var
np.ndarray.astype        np.ndarray.ctypes        np.ndarray.fill          np.ndarray.max           np.ndarray.ptp           np.ndarray.setfield      np.ndarray.swapaxes      np.ndarray.view
np.ndarray.base          np.ndarray.cumprod       np.ndarray.flags         np.ndarray.mean          np.ndarray.put           np.ndarray.setflags      np.ndarray.take

Examples

Basic examples
CREATE EXTENSION IF NOT EXISTS plpythonu;
CREATE EXTENSION PandaPost;

CREATE TEMP TABLE s AS SELECT array['a','b','c']::ndarray AS s1, array['c','d']::ndarray AS s2;

-- python repr() of an ndarray of strings
SELECT repr(s1) FROM s;
          repr
-------------------------
 array(['a', 'b', 'c'], +
       dtype='|S1')
(1 row)

-- python str() of same array
SELECT str(s1) FROM s;
      str
---------------
 ['a' 'b' 'c']
(1 row)

-- exclusive-or of two ndarrays
SELECT str(setxor1d(s1, s2)) FROM s;
      str
---------------
 ['a' 'b' 'd']
(1 row)

-- Intersection
SELECT str(intersect1d(s1, s2)) FROM s;
  str
-------
 ['c']
(1 row)

This more complicated example uses the lambad extension to create a dataframe, which can currently be returned as an ndarray.

Note
Eventually there will be an actual DataFrame Postgres data type
Basic ndarray type storing a subclass (in this case, a Pandas DataFrame)
CREATE EXTENSION IF NOT EXISTS lambda;
\set df pd.DataFrame.from_dict([{"a":1,"b":"a"},{"a":2,"b":"b"}])
SELECT repr(
    lambda(
      $l$(
        ndarray
      ) RETURNS ndarray
      LANGUAGE plpythonu
      TRANSFORM FOR TYPE ndarray
      AS $body$
        import pandas as pd

        return $l$ || :'df' || $l$
      $body$
      $l$
      , NULL::ndarray
    )
  );
  repr
---------
    a  b+
 0  1  a+
 1  2  b
(1 row)

Copyright and License

PandaPost is released under a BSD liscense.

Copyright (c) 2016 Jim Nasby <Jim.Nasby@BlueTreble.com>.