Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Python and Q: PyQ, qPython, and Getting Along

There are two serious Python-Q integration libraries, and they have different philosophies, different trade-offs, and different ways of making you feel like you chose wrong. Let’s save you the afternoon of discovery.

PyQ: KX’s official library. Deep integration — runs Python inside the q process, shares memory, no serialization overhead for in-process calls. Requires a real KX installation. Heavyweight but serious.

qPython: Pure Python, pure IPC. Connects to a running kdb+ process over the network. No special installation, no KX runtime dependency. The right choice for most external consumers of kdb+ data.

pykx: KX’s newer, actively-developed library. Wraps the KX C API, can run embedded or in IPC mode, has a pandas-compatible API, and is where KX is putting its energy. If you’re starting fresh, this is the one.

We’ll cover all three, but spend the most time on qPython (because you probably already have a kdb+ process and just want to talk to it) and pykx (because it’s the future).

qPython: The Pragmatic Choice

Installation

pip install qpython

That’s it. No KX runtime. Works against any kdb+ 3.x or 4.x process.

Connecting and Basic Queries

from qpython import qconnection
from qpython.qtype import QException

# Connect to a running kdb+ process
q = qconnection.QConnection(host='localhost', port=5001)
q.open()

# Execute a simple expression
result = q('2 + 2')
print(result)  # 4

# Execute a function call
result = q('til 10')
print(result)  # [0 1 2 3 4 5 6 7 8 9]

q.close()

Use it as a context manager to avoid forgetting to close:

from qpython import qconnection

with qconnection.QConnection(host='localhost', port=5001) as q:
    result = q('select from trade where date=2024.01.15, sym=`AAPL')
    print(result)

Type Mapping: Where Things Get Interesting

Q’s type system is richer than Python’s. When qPython receives data, it maps Q types to numpy arrays and pandas objects. Understanding this mapping saves debugging sessions.

import numpy as np
from qpython import qconnection
from qpython.qtype import (
    QSymbolList, QTimestampList, QFloat64List, QLONG_LIST
)

with qconnection.QConnection(host='localhost', port=5001, pandas=True) as q:
    # With pandas=True, tables come back as DataFrames
    df = q('select time, sym, price, size from trade where date=2024.01.15')
    print(df.dtypes)
    # time     datetime64[ns]
    # sym      object          <- Q symbols become Python strings
    # price    float64
    # size     int64

    print(df.head())

The pandas=True flag is almost always what you want for tabular data. Without it, you get a QTable object that you then have to convert anyway.

Sending Data to Q

Pushing data from Python to Q is where type handling gets tricky. Q is strict about types; Python is not.

import numpy as np
import pandas as pd
from qpython import qconnection
from qpython.qtype import QSymbolList, QLONG, QFLOAT64

with qconnection.QConnection(host='localhost', port=5001, pandas=True) as q:
    # Create a DataFrame to push
    df = pd.DataFrame({
        'time': pd.to_datetime(['2024-01-15 09:30:00', '2024-01-15 09:30:01']),
        'sym': ['AAPL', 'MSFT'],
        'price': [185.50, 375.20],
        'size': [100, 200]
    })

    # Push a variable into Q's namespace
    q('set', np.string_('mydata'), df)

    # Verify it's there
    result = q('count mydata')
    print(result)  # 2

    # Now query it
    result = q('select from mydata where sym=`AAPL')
    print(result)

The Symbol Problem

Q symbols and Python strings are conceptually similar but mechanically different. When you query Q and get back symbol data, qPython gives you Python bytes objects by default (without pandas=True) or strings (with it). When you send symbols to Q, you need to be explicit:

from qpython.qtype import QSymbolList
import numpy as np

with qconnection.QConnection(host='localhost', port=5001) as q:
    # Wrong: Python list of strings won't be interpreted as Q symbols
    # q('insert', np.string_('mytable'), [['AAPL', 'MSFT'], [100, 200]])

    # Right: use QSymbolList for symbol columns
    syms = QSymbolList(np.array(['AAPL', 'MSFT'], dtype='S'))
    sizes = np.array([100, 200], dtype=np.int64)

    q('`mytable insert', (syms, sizes))

The difference between bytes and symbols matters in Q. If sym in your trade table is a symbol and you send bytes, you’ll get a type error on join operations that will be deeply confusing for ten minutes.

Asynchronous Calls

For fire-and-forget operations (publishing to a tickerplant, for instance):

from qpython import qconnection

with qconnection.QConnection(host='localhost', port=5001) as q:
    # Synchronous — waits for response
    result = q('.u.upd', np.string_('trade'), data)

    # Asynchronous — returns immediately
    q('.u.upd', np.string_('trade'), data, sync=False)

Be careful with async calls in tight loops — you can flood a Q process faster than it can handle messages. Add a small sleep or batch your updates if you’re publishing high-frequency data.

Error Handling

Q errors propagate back as QException:

from qpython.qtype import QException

with qconnection.QConnection(host='localhost', port=5001) as q:
    try:
        result = q('1 + `sym')  # type error
    except QException as e:
        print(f"Q error: {e}")  # type
    except Exception as e:
        print(f"Connection error: {e}")

pykx: The Modern Choice

pykx is KX’s actively-maintained library that has both an embedded mode (runs kdb+ in-process) and an IPC mode (connects to a running process). The API is cleaner than qPython and it has better pandas and numpy integration.

Installation

pip install pykx

For IPC-only mode (no KX license needed):

# Set this before importing pykx to use IPC-only mode
export PYKX_LICENSED_LIBRARIES=false
# Or in Python:
import os
os.environ['PYKX_LICENSED_LIBRARIES'] = 'false'

IPC Mode

import pykx as kx

# Connect to a running kdb+ process
with kx.SyncQConnection(host='localhost', port=5001) as q:
    # Execute q code
    result = q('select from trade where date=2024.01.15')

    # Convert to pandas
    df = result.pd()
    print(df)

    # Or arrow
    table = result.pa()

pykx Type System

pykx has a clean type hierarchy that mirrors Q’s:

import pykx as kx

with kx.SyncQConnection(host='localhost', port=5001) as q:
    # Atoms
    x = q('42')
    print(type(x))         # pykx.LongAtom
    print(x.py())          # 42 (Python int)

    # Lists
    xs = q('1 2 3f')
    print(type(xs))        # pykx.FloatVector
    print(xs.np())         # numpy array([1., 2., 3.])

    # Tables come back as pykx.Table
    tbl = q('([] a:1 2 3; b:`x`y`z)')
    print(tbl.pd())        # pandas DataFrame

Sending Typed Data

import pykx as kx
import pandas as pd
import numpy as np

with kx.SyncQConnection(host='localhost', port=5001) as q:
    # pykx can convert pandas DataFrames directly
    df = pd.DataFrame({
        'sym': ['AAPL', 'MSFT', 'GOOG'],
        'price': [185.5, 375.2, 140.8],
        'size': [100, 200, 150]
    })

    # Set a variable in Q
    q['mydata'] = df

    # Call a Q function with a Python argument
    result = q('{[t] select from t where price > 200}', df)
    print(result.pd())

Async Connections

import pykx as kx
import asyncio

async def publish_data():
    async with kx.AsyncQConnection(host='localhost', port=5001) as q:
        # Async query
        result = await q('select from trade')
        df = result.pd()

        # Fire-and-forget publish
        await q('.u.upd', 'trade', df, wait=False)

PyQ: The Deep Integration Option

PyQ is for when you want Python to run inside the q process — sharing memory, zero serialization overhead, using Q data structures directly from Python. It’s powerful and it requires a full KX installation.

pip install pyq
# Run with q's Python interpreter:
pyq my_script.py

Inside a PyQ session:

from pyq import q, K

# Q functions are accessible as attributes
q.til(10)            # calls q's til function
q('.', 'mylib.q')   # loads a Q script

# K objects are Q values
x = K(3.14)
y = q.sqrt(x)

# Create Q tables from Python
import numpy as np
data = {
    'price': np.array([100.5, 101.2, 99.8]),
    'size': np.array([100, 200, 150], dtype=np.int64)
}
q['mytrade'] = data

PyQ’s strength is in embedded analytics — Python ML libraries operating on Q data without copying it. If you’re running TensorFlow or scikit-learn against live kdb+ data and latency matters, PyQ is the right architecture.

Which One Should You Use?

ScenarioLibrary
External service consuming kdb+ dataqPython or pykx IPC
New project, greenfield setuppykx
High-frequency publisher to tickerplantqPython (async) or pykx
ML/analytics running inside q processPyQ
Don’t have KX license, just want dataqPython
Need pandas/arrow integrationpykx

The honest answer: start with pykx. Its API is cleaner, it’s actively maintained, and it handles the type system better than qPython. Use qPython if you need a lighter dependency or you’re working with legacy code that already uses it. Use PyQ only if you genuinely need in-process Python execution.

A Complete Example: Time-Series Analysis Pipeline

This is the pattern that actually comes up in production — Python consuming kdb+ data for analysis, then writing results back.

import pykx as kx
import pandas as pd
import numpy as np
from scipy import stats

def compute_vwap(df: pd.DataFrame) -> pd.DataFrame:
    """Compute VWAP from trade data."""
    df['notional'] = df['price'] * df['size']
    vwap = df.groupby('sym').apply(
        lambda x: x['notional'].sum() / x['size'].sum()
    ).reset_index()
    vwap.columns = ['sym', 'vwap']
    return vwap

def compute_volatility(df: pd.DataFrame, window: int = 20) -> pd.DataFrame:
    """Rolling annualized volatility."""
    df = df.sort_values(['sym', 'time'])
    df['log_ret'] = df.groupby('sym')['price'].transform(
        lambda x: np.log(x / x.shift(1))
    )
    df['vol'] = df.groupby('sym')['log_ret'].transform(
        lambda x: x.rolling(window).std() * np.sqrt(252)
    )
    return df[['time', 'sym', 'vol']].dropna()

def main():
    with kx.SyncQConnection(host='localhost', port=5001) as q:
        # Pull today's trade data
        trades = q('''
            select time, sym, price, size
            from trade
            where date=.z.d,
                  sym in `AAPL`MSFT`GOOG`AMZN`META
        ''').pd()

        print(f"Loaded {len(trades):,} trades")

        # Compute analytics in Python
        vwap = compute_vwap(trades)
        vol = compute_volatility(trades)

        # Write results back to Q
        q['analytics_vwap'] = vwap
        q['analytics_vol'] = vol

        # Or insert into a Q table
        q('{`results_vwap upsert x}', vwap)

        print("Analytics written to Q")
        print(vwap)

if __name__ == '__main__':
    main()

Timestamp Handling (The Thing That Will Burn You Once)

Q timestamps are nanoseconds since 2000.01.01. Python datetimes are microsecond-resolution since 1970-01-01. Every library does the conversion for you, but sometimes it goes wrong.

import pykx as kx
import pandas as pd

with kx.SyncQConnection(host='localhost', port=5001) as q:
    # Q timestamp
    ts = q('2024.01.15D09:30:00.123456789')

    # pykx converts correctly
    print(ts.py())  # datetime.datetime(2024, 1, 15, 9, 30, 0, 123456)
                    # Note: microseconds, nanoseconds truncated

    # For nanosecond precision, use numpy
    print(ts.np())  # numpy.datetime64('2024-01-15T09:30:00.123456789')

    # pandas preserves nanoseconds
    df = q('([] t:enlist 2024.01.15D09:30:00.123456789)').pd()
    print(df['t'].dtype)      # datetime64[ns]
    print(df['t'].iloc[0])    # 2024-01-15 09:30:00.123456789

The gotcha: if you use ts.py() on a timestamp column, you lose the nanoseconds silently. Use .np() or work through pandas for nanosecond-resolution data.

Authentication

Both libraries support authenticated connections:

# qPython
q = qconnection.QConnection(
    host='localhost',
    port=5001,
    username='myuser',
    password='mypass'
)

# pykx
q = kx.SyncQConnection(
    host='localhost',
    port=5001,
    username='myuser',
    password='mypass'
)

Q’s authentication is username:password in the connection handshake. If the server has .z.pw defined, it uses that for validation; otherwise it uses the OS password file. For production, use .z.pw with proper credential management — don’t hardcode passwords in code that’s going into version control.