SQLite¶
The core SqliteDatabase handles pragmas, user-defined functions,
WAL mode, full-text search and JSON. Because the full-text search and JSON
fields are specific to SQLite, these features are provided by playhouse.sqlite_ext.
Implementations¶
SqliteDatabaseCore SQLite implementation. Provides:
Pragma support (including WAL-mode)
User-defined functions
ATTACH / DETACH databases
Full-text search
JSON
Full-text search and JSON implementations available in
playhouse.sqlite_ext.CySqliteDatabase(playhouse.cysqlite_ext)Extends
SqliteDatabase, uses cysqlite driver.All above functionality
Table-value functions
Commit / Rollback / Update / Progress / Trace hooks
BLOB I/O
Online backups
Can be built with encryption.
APSWDatabase(playhouse.apsw_ext)Extends
SqliteDatabase, uses apsw driver.APSW is a thin C-level driver that exposes the full range of SQLite functionality.
SqlCipherDatabase(playhouse.sqlcipher_ext)Extends
SqliteDatabase, uses sqlcipher3 driver.SQLCipher provides transparent full-database encryption using 256-bit AES, ensuring data on-disk is secure.
SqliteQueueDatabase(playhouse.sqliteq)Extends
SqliteDatabase.Provides a SQLite database implementation with a long-lived background writer thread. All write operations are managed by a single write connection, preventing timeouts and database locking issues. This implementation is useful when using Sqlite in multi-threaded environments with frequent writes.
PRAGMA statements¶
SQLite allows run-time configuration through PRAGMA statements (SQLite documentation).
These statements are typically run when a new database connection is created.
To specify default PRAGMA statements for connections:
db = SqliteDatabase('my_app.db', pragmas={
'journal_mode': 'wal',
'cache_size': 10000, # 10000 pages, or ~40MB
'foreign_keys': 1, # Enforce foreign-key constraints
})
PRAGMAs may also be configured dynamically using either the pragma()
method or the special properties exposed on the SqliteDatabase object:
# Set cache size to 64MB for *current connection*.
db.pragma('cache_size', -64000)
# Same as above.
db.cache_size = -64000
# Read the value of several pragmas:
print('cache_size:', db.cache_size)
print('foreign_keys:', db.foreign_keys)
print('journal_mode:', db.journal_mode)
print('page_size:', db.page_size)
# Set foreign_keys pragma on current connection *AND* on all
# connections opened subsequently.
db.pragma('foreign_keys', 1, permanent=True)
Attention
Pragmas set using the pragma() method are not
re-applied when a new connection opens. To configure a pragma to be
run whenever a new connection is opened, specify permanent=True.
db.pragma('foreign_keys', 1, permanent=True)
See also
SQLite PRAGMA documentation: https://sqlite.org/pragma.html
User-Defined Functions¶
SQLite can be extended with user-defined Python code. The
SqliteDatabase class supports a variety of user-defined extensions:
- Functions
User-defined functions accept any number of parameters and return a single value.
SqliteDatabase.func()- decorator.
- Aggregates
Aggregate values across multiple rows and return a single value.
- Window Functions
Aggregates which support operating on sliding windows of data.
- Collations
Control how values are ordered and sorted.
- Table Functions
User-defined tables (requres
cysqlite).- Shared Libraries
Load an extension from a shared library.
Function example¶
db = SqliteDatabase('analytics.db')
from urllib.parse import urlparse
@db.func('hostname')
def hostname(url):
if url is not None:
return urlparse(url).netloc
# Call this function in our code:
# The following finds the most common hostnames of referrers by count:
query = (PageView
.select(fn.hostname(PageView.referrer), fn.COUNT(PageView.id))
.group_by(fn.hostname(PageView.referrer))
.order_by(fn.COUNT(PageView.id).desc()))
Aggregate example¶
User-defined aggregates must define two methods:
step(*values)- called once for each row being aggregated.finalize()- called only once to produce final aggregate value.
from hashlib import md5
@db.aggregate('md5')
class MD5Checksum(object):
def __init__(self):
self.checksum = md5()
def step(self, value):
self.checksum.update(value.encode('utf-8'))
def finalize(self):
return self.checksum.hexdigest()
# Usage:
# The following computes an aggregate MD5 checksum for files broken
# up into chunks and stored in the database.
query = (FileChunk
.select(FileChunk.filename, fn.MD5(FileChunk.data))
.group_by(FileChunk.filename)
.order_by(FileChunk.filename, FileChunk.sequence))
Window function example¶
User-defined window functions are simply aggregates with two additional methods:
step(*values)- called for each row being aggregated.inverse(*values)- “invert” the effect of a call tostep(*values).value()- return the current value of the aggregate.finalize()- return final aggregate value.
# Window functions are normal aggregates with two additional methods:
# inverse(value) - Perform the inverse of step(value).
# value() - Report value at current step.
@db.aggregate('mysum')
class MySum(object):
def __init__(self):
self._value = 0
def step(self, value):
self._value += (value or 0)
def inverse(self, value):
self._value -= (value or 0) # Do opposite of "step()".
def value(self):
return self._value
def finalize(self):
return self._value
# e.g., aggregate sum of employee salaries over their department.
query = (Employee
.select(
Employee.department,
Employee.salary,
fn.mysum(Employee.salary).over(
partition_by=[Employee.department]))
.order_by(Employee.id))
Collation example¶
Collations accept two values and provide a value indicating how they should be
ordered (e.g. cmp(lhs, rhs)).
@db.collation('ireverse')
def collate_reverse(s1, s2):
# Case-insensitive reverse.
s1, s2 = s1.lower(), s2.lower()
return (s1 < s2) - (s1 > s2) # Equivalent to -cmp(s1, s2)
# To use this collation to sort books in reverse order...
Book.select().order_by(collate_reverse.collation(Book.title))
# Or...
Book.select().order_by(Book.title.asc(collation='reverse'))
Table function example¶
Example user-defined table-value function (see cysqlite TableFunction docs
for full details on TableFunction).
from cysqlite import TableFunction
from playhouse.cysqlite_ext import CySqliteDatabase
db = CySqliteDatabase('my_app.db')
@db.table_function('series')
class Series(TableFunction):
columns = ['value']
params = ['start', 'stop', 'step']
def initialize(self, start=0, stop=None, step=1):
"""
Table-functions declare an initialize() method, which is
called with whatever arguments the user has called the
function with.
"""
self.start = self.current = start
self.stop = stop or float('Inf')
self.step = step
def iterate(self, idx):
"""
Iterate is called repeatedly by the SQLite database engine
until the required number of rows has been read **or** the
function raises a `StopIteration` signalling no more rows
are available.
"""
if self.current > self.stop:
raise StopIteration
ret, self.current = self.current, self.current + self.step
return (ret,)
# Usage:
cursor = db.execute_sql('SELECT * FROM series(?, ?, ?)', (0, 5, 2))
for value, in cursor:
print(value)
# Prints:
# 0
# 2
# 4
Locking Mode for Transactions¶
SQLite transactions can be opened in three different modes:
Deferred (default) - only acquires lock when a read or write is performed. The first read creates a shared lock and the first write creates a reserved lock. Because the acquisition of the lock is deferred until actually needed, it is possible that another thread or process could create a separate transaction and write to the database.
Immediate - a reserved lock is acquired immediately. In this mode, no other connection may write to the database or open an immediate or exclusive transaction. Other processes can continue to read from the database, however.
Exclusive - opens an exclusive lock which prevents all (except for read uncommitted) connections from accessing the database until the transaction is complete.
Example specifying the locking mode:
db = SqliteDatabase('app.db')
with db.atomic('EXCLUSIVE'):
read()
write()
@db.atomic('IMMEDIATE')
def some_other_function():
# This function is wrapped in an "IMMEDIATE" transaction.
do_something_else()
For more information, see the SQLite locking documentation. To learn more about transactions in Peewee, see the Transactions documentation.
Danger
Do not alter the isolation_level property of the sqlite3.Connection
object. Peewee requires the sqlite3 driver be in autocommit-mode, which
is handled automatically by SqliteDatabase.
CySqlite¶
CySqliteDatabase uses the cysqlite
driver, a high-performance alternative to the standard library sqlite3
module. cysqlite provides additional features and hooks not available with
in the standard library sqlite3 driver.
Installation:
pip install cysqlite
Usage:
from playhouse.cysqlite_ext import CySqliteDatabase
db = CySqliteDatabase('my_app.db', pragmas={
'cache_size': -64000,
'journal_mode': 'wal',
'foreign_keys': 1,
})
- class CySqliteDatabase(database, **kwargs)¶
- Parameters:
pragmas (list) – A list of 2-tuples containing pragma key and value to set every time a connection is opened.
timeout – Set the busy-timeout on the SQLite driver (in seconds).
rank_functions (bool) – Make search result ranking functions available. Recommended only when using FTS4.
regexp_function (bool) – Make the REGEXP function available.
See also
CySqliteDatabase extends
SqliteDatabaseand inherits all methods for declaring user-defined functions, aggregates, window functions, collations, pragmas, etc.Example:
db = CySqliteDatabase('app.db', pragmas={'journal_mode': 'wal'})
- table_function(name)¶
Class-decorator for registering a
cysqlite.TableFunction. Table functions are user-defined functions that, rather than returning a single, scalar value, can return any number of rows of tabular data.See cysqlite docs for details on
TableFunctionAPI.from cysqlite import TableFunction @db.table_function('series') class Series(TableFunction): columns = ['value'] params = ['start', 'stop', 'step'] def initialize(self, start=0, stop=None, step=1): """ Table-functions declare an initialize() method, which is called with whatever arguments the user has called the function with. """ self.start = self.current = start self.stop = stop or float('Inf') self.step = step def iterate(self, idx): """ Iterate is called repeatedly by the SQLite database engine until the required number of rows has been read **or** the function raises a `StopIteration` signalling no more rows are available. """ if self.current > self.stop: raise StopIteration ret, self.current = self.current, self.current + self.step return (ret,) cursor = db.execute_sql('SELECT * FROM series(?, ?, ?)', (0, 5, 2)) for (value,) in cursor: print(value) # Prints: # 0 # 2 # 4
- register_table_function(klass, name)¶
- Parameters:
klass (TableFunction) – class implementing TableFunction API.
name (str) – name for user-defined table function.
Register a
cysqlite.TableFunctionclass with the connection. Table functions are user-defined functions that, rather than returning a single, scalar value, can return any number of rows of tabular data.See also
CySqliteDatabase.table_function()for example implementation.cysqlite docs for details on
TableFunctionAPI.
- unregister_table_function(name)¶
- Parameters:
name – Name of the user-defined table function.
- Returns:
True or False, depending on whether the function was removed.
Unregister the user-defined table function.
- on_commit(fn)¶
- Parameters:
fn – callable or
Noneto clear the current hook.
Register a callback to be executed whenever a transaction is committed on the current connection. The callback accepts no parameters and the return value is ignored.
However, if the callback raises a
ValueError, the transaction will be aborted and rolled-back.Example:
db = CySqliteDatabase(':memory:') @db.on_commit def on_commit(): logger.info('COMMITing changes')
- on_rollback(fn)¶
- Parameters:
fn – callable or
Noneto clear the current hook.
Register a callback to be executed whenever a transaction is rolled back on the current connection. The callback accepts no parameters and the return value is ignored.
Example:
@db.on_rollback def on_rollback(): logger.info('Rolling back changes')
- on_update(fn)¶
- Parameters:
fn – callable or
Noneto clear the current hook.
Register a callback to be executed whenever the database is written to (via an UPDATE, INSERT or DELETE query). The callback should accept the following parameters:
query- the type of query, either INSERT, UPDATE or DELETE.database name - the default database is named main.
table name - name of table being modified.
rowid - the rowid of the row being modified.
The callback’s return value is ignored.
Example:
db = CySqliteDatabase(':memory:') @db.on_update def on_update(query_type, db, table, rowid): # e.g. INSERT row 3 into table users. logger.info('%s row %s into table %s', query_type, rowid, table)
- authorizer(fn)¶
- Parameters:
fn – callable or
Noneto clear the current authorizer.
Register an authorizer callback. Authorizer callbacks must accept 5 parameters, which vary depending on the operation being checked.
op: operation code, e.g.
cysqlite.SQLITE_INSERT.p1: operation-specific value, e.g. table name for
SQLITE_INSERT.p2: operation-specific value.
p3: database name, e.g.
"main".p4: inner-most trigger or view responsible for the access attempt if applicable, else
None.
See sqlite authorizer documentation for description of authorizer codes and values for parameters p1 and p2.
The authorizer callback must return one of:
cysqlite.SQLITE_OK: allow operation.cysqlite.SQLITE_IGNORE: allow statement compilation but prevent the operation from occuring.cysqlite.SQLITE_DENY: prevent statement compilation.
More details can be found in the cysqlite docs.
- trace(fn, mask=2, expand_sql=True):
- Parameters:
fn – callable or
Noneto clear the current trace hook.mask (int) – mask of what types of events to trace. Default value corresponds to
SQLITE_TRACE_PROFILE.expand_sql (bool) – Pass callback the
sqlite3_expanded_sql()fromsqlite3_stmt(expands bound parameters)
Register a trace hook (
sqlite3_trace_v2). Trace callback must accept 4 parameters, which vary depending on the operation being traced.event: type of event, e.g.
SQLITE_TRACE_PROFILE.sid: memory address of statement (only
SQLITE_TRACE_CLOSE), else -1.sql: SQL string. If
expand_sqlthen bound parameters will be expanded (forSQLITE_TRACE_CLOSE,sql=None).ns: estimated number of nanoseconds the statement took to run (only
SQLITE_TRACE_PROFILE), else -1.
Any return value from callback is ignored.
More details can be found in the cysqlite docs.
- slow_query_log(threshold_ms=50, logger=None, level=logging.WARNING, expand_sql=True)¶
- Parameters:
threshold_ms – estimated millisecond threshold to log slow queries.
logger – logging namespace, defaults to
'peewee.cysqlite_ext'.level (int) – level for slow query log.
expand_sql (bool) – expand bound parameters in SQL query.
Register a
sqlite3_trace_v2callback that will log slow queries to the given logger. Overrides previously-registeredtrace()callback. Automatically re-registered when new connection is opened.
- progress(fn, n=1)¶
- Parameters:
fn – callable or
Noneto clear the current progress handler.n (int) – approximate number of VM instructions to execute between calls to the progress handler.
Register a progress handler (
sqlite3_progress_handler). Callback takes no arguments and returns 0 to allow progress to continue or any non-zero value to interrupt progress.More details can be found in the cysqlite docs.
- autocommit¶
Property which returns a boolean indicating if autocommit is enabled. By default, this value will be
Trueexcept when inside a transaction (oratomic()block).Example:
>>> db = CySqliteDatabase(':memory:') >>> db.autocommit True >>> with db.atomic(): ... print(db.autocommit) ... False >>> db.autocommit True
- backup(destination, pages=None, name=None, progress=None)¶
- Parameters:
destination (CySqliteDatabase) – Database object to serve as destination for the backup.
pages (int) – Number of pages per iteration. Default value of -1 indicates all pages should be backed-up in a single step.
name (str) – Name of source database (may differ if you used ATTACH DATABASE to load multiple databases). Defaults to “main”.
progress – Progress callback, called with three parameters: the number of pages remaining, the total page count, and whether the backup is complete.
Example:
master = CySqliteDatabase('master.db') replica = CySqliteDatabase('replica.db') # Backup the contents of master to replica. master.backup(replica)
- backup_to_file(filename, pages, name, progress)¶
- Parameters:
filename – Filename to store the database backup.
pages (int) – Number of pages per iteration. Default value of -1 indicates all pages should be backed-up in a single step.
name (str) – Name of source database (may differ if you used ATTACH DATABASE to load multiple databases). Defaults to “main”.
progress – Progress callback, called with three parameters: the number of pages remaining, the total page count, and whether the backup is complete.
Backup the current database to a file. The backed-up data is not a database dump, but an actual SQLite database file.
Example:
db = CySqliteDatabase('app.db') def nightly_backup(): filename = 'backup-%s.db' % (datetime.date.today()) db.backup_to_file(filename)
- blob_open(table, column, rowid, read_only=False)¶
- Parameters:
table (str) – Name of table containing data.
column (str) – Name of column containing data.
rowid (int) – ID of row to retrieve.
read_only (bool) – Open the blob for reading only.
dbname (str) – Database name (e.g. if multiple databases attached).
- Returns:
cysqlite.Blobinstance which provides efficient access to the underlying binary data.- Return type:
cysqlite.Blob
See cysqlite documentation for more details.
Example:
class Image(Model): filename = TextField() data = BlobField() buf_size = 1024 * 1024 * 8 # Allocate 8MB for storing file. rowid = Image.insert({ Image.filename: 'thefile.jpg', Image.data: fn.zeroblob(buf_size), }).execute() # Open the blob, returning a file-like object. blob = db.blob_open('image', 'data', rowid) # Write some data to the blob. blob.write(image_data) img_size = blob.tell() # Read the data back out of the blob. blob.seek(0) image_data = blob.read(img_size)
- class PooledCySqliteDatabase(database, **kwargs)¶
Connection-pooling variant of
CySqliteDatabase.
APSW¶
APSW is a thin C wrapper over SQLite’s C API that exposes nearly every SQLite feature including virtual tables, virtual filesystems, and BLOB I/O.
Installation:
pip install apsw
Usage:
from playhouse.apsw_ext import APSWDatabase
db = APSWDatabase('my_app.db')
class BaseModel(Model):
class Meta:
database = db
Use the Field subclasses from playhouse.apsw_ext rather than those
from peewee to ensure correct type adaptation. For example, use
playhouse.apsw_ext.DateTimeField instead of peewee.DateTimeField.
- class APSWDatabase(database, **connect_kwargs)¶
Subclass of
SqliteDatabaseusing the APSW driver.- Parameters:
database (string) – filename of sqlite database
connect_kwargs – keyword arguments passed to apsw when opening a connection
- register_module(mod_name, mod_inst)¶
Register a virtual table module globally. See the APSW virtual table documentation.
- Parameters:
mod_name (string) – name to use for module
mod_inst (object) – an object implementing the Virtual Table interface
- unregister_module(mod_name)¶
Unregister a previously registered module.
SQLCipher¶
SQLCipher is an encrypted wrapper
around SQLite. Peewee exposes it through SqlCipherDatabase, which
is API-identical to SqliteDatabase except for its constructor.
Installation:
pip install sqlcipher3
Usage:
from playhouse.sqlcipher_ext import SqlCipherDatabase
db = SqlCipherDatabase(
'app.db',
passphrase=os.environ['PASSPHRASE'],
pragmas={'cache_size': -64000})
Example usage with deferred initialization and passphrase prompt:
db = SqlCipherDatabase(None)
class BaseModel(Model):
class Meta:
database = db
class Secret(BaseModel):
value = TextField()
# Prompt the user and initialize the database with their passphrase.
while True:
db.init('my_app.db', passphrase=input('Passphrase: '))
try:
db.get_tables() # Will raise if passphrase is wrong.
break
except DatabaseError as exc:
print('Wrong passphrase.')
db.init(None)
Pragma configuration (e.g. increasing PBKDF2 iterations):
db = SqlCipherDatabase('my_app.db',
passphrase='s3cr3t',
pragmas={'kdf_iter': 1_000_000})
SQLCipher can be configured using a number of extension PRAGMAs. The list of PRAGMAs and their descriptions can be found in the SQLCipher documentation.
- class SqlCipherDatabase(database, passphrase, **kwargs)¶
- Parameters:
database (str) – Path to the encrypted database file.
passphrase (str) – Encryption passphrase (should be 8 character minimum; enforce stronger requirements in your application).
If the database file does not exist, it is created and encrypted with a key derived from
passphrase. If it does exist,passphrasemust match the one used when the file was created.If the passphrase is incorrect, an error will be raised when first attempting to access the database (typically
DatabaseError: file is not a database).- rekey(passphrase)¶
Change the encryption passphrase for the open database.
SqliteQueueDatabase¶
SqliteQueueDatabase serializes all write queries through a
single long-lived connection on a dedicated background thread. This allows
multiple application threads to write to a SQLite database concurrently
without conflict or timeout errors.
SqliteQueueDatabase can be used as a drop-in replacement for the regular
SqliteDatabase if you want simple read and write access to a
SQLite database from multiple threads, and do not need transactions.
from playhouse.sqliteq import SqliteQueueDatabase
db = SqliteQueueDatabase(
'my_app.db',
use_gevent=False, # Use stdlib threading (default).
autostart=True, # Start the writer thread immediately.
queue_max_size=64, # Max pending writes before blocking.
results_timeout=5.0, # Seconds to wait for a write to complete.
pragmas={'journal_mode': 'wal'})
If you set autostart=False, start the writer thread explicitly:
db.start()
Stop the writer thread on application shutdown (waits for pending writes):
import atexit
@atexit.register
def _stop():
db.stop()
Read queries work as normal. Open and close the connection per-request as you would with any other database. Only writes are funneled through the queue.
Transactions are not supported. Because writes from different threads
are interleaved, there is no way to guarantee that the statements in a
transaction from one thread execute atomically without statements from
another thread appearing between them. The atomic() and
transaction() methods raise a ValueError if called.
If you need to temporarily bypass the queue and write directly (for
example, during a batch import), use pause()
and unpause().
- class SqliteQueueDatabase(database, use_gevent=False, autostart=True, queue_max_size=None, results_timeout=None, **kwargs)¶
- Parameters:
database (str) – database filename.
use_gevent (bool) – use gevent instead of
threading.autostart (bool) – automatically start writer background thread.
queue_max_size (int) – maximum size of pending writes queue.
results_timeout (float) – timeout for waiting for query results from write thread (seconds).
- start()¶
Start the background writer thread.
- stop()¶
Signal the writer thread to stop. Blocks until all pending writes are flushed.
- is_stopped()¶
Return
Trueif the writer thread is not running.
- pause()¶
Block until the writer thread finishes its current work, then disconnect it. The calling thread takes over direct database access. Must be followed by a call to
unpause().
- unpause()¶
Resume the writer thread and reconnect the queue.
SQLite-Specific Fields¶
These field classes live in playhouse.sqlite_ext and can be used with:
- class RowIDField¶
Primary-key field mapped to SQLite’s implicit
rowidcolumn.For more information, see the SQLite documentation on rowid tables.
class Note(Model): rowid = RowIDField() # Implied primary_key=True. content = TextField() timestamp = TimestampField()
RowIDField can be mapped to a different field name, but it’s underlying column name will always be
rowid.class Note(Model): id = RowIDField() ...
- class AutoIncrementField¶
Integer primary key that uses SQLite’s
AUTOINCREMENTkeyword, guaranteeing the primary key is always strictly increasing even after deletions. Has a small performance cost versus the defaultPrimaryKeyFieldorRowIDField.See the SQLite AUTOINCREMENT documentation for details.
- class ISODateTimeField¶
Subclass of
DateTimeFieldthat preserves UTC offset information for timezone-aware datetimes when storing to SQLite’s text-based datetime representation.
- class TDecimalField(max_digits=10, decimal_places=5, auto_round=False, rounding=None, *args, **kwargs)¶
Subclass of
DecimalFieldthat stores decimal values in aTEXTcolumn to avoid any potential loss of precision that may occur when storing in aREAL(double-precision floating point) column. SQLite does not have a true numeric type, so this field ensures no precision is lost when using Decimals.
SQLite JSON¶
JSONField enables storing and querying JSON data
in SQLite using the SQLite json functions.
- class JSONField(json_dumps=None, json_loads=None, **kwargs)¶
- Parameters:
json_dumps – Custom JSON serializer. Defaults to
json.dumps.json_loads – Custom JSON deserializer. Defaults to
json.loads.
Stores and retrieves JSON data transparently and provides efficient implementations for in-place modification and querying. Data is automatically serialized on write, deserialized on read.
Example model:
from peewee import * from playhouse.sqlite_ext import JSONField db = SqliteDatabase(':memory:') class Config(db.Model): data = JSONField() Config.create_table() # Create two rows. Config.create(data={'timeout': 30, 'retry': {'max': 5}}) Config.create(data={'timeout': 10, 'retry': {'max': 10}})
To access or modify specific object keys or array indexes in a JSON structure, you can treat the
JSONFieldas if it were a dictionary/list:# Select or order by a JSON value: query = (Config .select(Config, Config.data['timeout'].alias('timeout')) .order_by(Config.data['timeout'].desc())) # Aggregate on nested value: avg = (Config .select(fn.SUM(Config.data['timeout']) / fn.COUNT(Config.id)) .scalar()) # Filter by nested value: Config.select().where(Config.data['retry']['max'] < 8)
Data can be atomically updated, written and removed in-place:
# In-place update (preserves other keys): (Config .update(data=Config.data.update({'timeout': 60})) .where(Config.data['timeout'] >= 30) .execute()) # Set a specific path: (Config .update(data=Config.data['timeout'].set(120)) .where(Config.data['retry']['max'] == 5) .execute()) # Update a specific path with an object. Existing field ("max") will be # preserved in this example. (Config .update(data=Config.data['retry'].update({'backoff': 1})) .execute()) # To overwrite a specific path with an object, use set(): (Config .update(data=Config.data['retry'].set({'allowed': 10})) .execute()) # Remove a key atomically: (Config .update(data=Config.data.update({'retry': None})) .where(Config.id == 1) .execute()) # Another way to remove atomically: (Config .update(data=Config.data['retry'].remove()) .where(Config.id == 2) .execute())
Helpers for other JSON scenarios:
# Query JSON types: query = (Config .select(Config.data.json_type(), Config.data['timeout'].json_type()) .tuples()) # [('object', 'integer'), ('object', 'integer')] # Query length of an array: cfg1 = Config.create(data={'statuses': [1, 99, 1, 1]}) cfg2 = Config.create(data={'statuses': [1, 1]}) query = (Config .select( Config.data['statuses'], Config.data['statuses'].length()) .tuples()) # [([1, 99, 1, 1], 4), ([1, 1], 2)]
Let’s add a nested value and then see how to iterate through it’s contents recursively using the
tree()method:Config.create(data={'x1': {'y1': 'z1', 'y2': 'z2'}, 'x2': [1, 2]}) tree = Config.data.tree().alias('tree') query = (Config .select(Config.id, tree.c.fullkey, tree.c.value) .from_(Config, tree)) for row in query.tuples(): print(row) (1, '$', {'x1': {'y1': 'z1', 'y2': 'z2'}, 'x2': [1, 2]}), (1, '$.x2', [1, 2]), (1, '$.x2[0]', 1), (1, '$.x2[1]', 2), (1, '$.x1', {'y1': 'z1', 'y2': 'z2'}), (1, '$.x1.y1', 'z1'), (1, '$.x1.y2', 'z2')]
The
tree()andchildren()methods are powerful. For more information on how to utilize them, see the json1 extension documentation.- __getitem__(item)¶
- Parameters:
item – Access a specific key or array index in the JSON data.
- Returns:
a special object exposing access to the JSON data.
- Return type:
Access a specific key or array index in the JSON data. Returns a
JSONPathobject, which exposes convenient methods for reading or modifying a particular part of a JSON object.Example:
# If metadata contains {"tags": ["list", "of", "tags"]}, we can # extract the first tag in this way: Post.select(Post, Post.metadata['tags'][0].alias('first_tag'))
For more examples see the
JSONPathAPI documentation.
- extract(*paths)¶
- Parameters:
paths – One or more JSON paths to extract.
Extract one or more JSON path values. Returns a list when multiple paths are given.
- extract_json(path)¶
- Parameters:
path (str) – JSON path
Extract the value at the specified path as a JSON data-type. This corresponds to the
->operator added in Sqlite 3.38.
- extract_text(path)¶
- Parameters:
path (str) – JSON path
Extract the value at the specified path as a SQL data-type. This corresponds to the
->>operator added in Sqlite 3.38.
- set(value, as_json=None)¶
- Parameters:
value – a scalar value, list, or dictionary.
as_json (bool) – force the value to be treated as JSON, in which case it will be serialized as JSON in Python beforehand. By default, lists and dictionaries are treated as JSON to be serialized, while strings and integers are passed as-is.
Set the value stored in a
JSONField.Uses the json_set() function from the json1 extension.
- replace(value, as_json=None)¶
- Parameters:
value – a scalar value, list, or dictionary.
as_json (bool) – force the value to be treated as JSON, in which case it will be serialized as JSON in Python beforehand. By default, lists and dictionaries are treated as JSON to be serialized, while strings and integers are passed as-is.
Replace the existing value stored in a
JSONField. Will not create if does not exist.Uses the json_replace() function from the json1 extension.
- insert(value, as_json=None)¶
- Parameters:
value – a scalar value, list, or dictionary.
as_json (bool) – force the value to be treated as JSON, in which case it will be serialized as JSON in Python beforehand. By default, lists and dictionaries are treated as JSON to be serialized, while strings and integers are passed as-is.
Insert value into
JSONField. Will not overwrite existing.Uses the json_insert() function from the json1 extension.
- append(value, as_json=None)¶
- Parameters:
value – a scalar value, list, or dictionary.
as_json (bool) – force the value to be treated as JSON, in which case it will be serialized as JSON in Python beforehand. By default, lists and dictionaries are treated as JSON to be serialized, while strings and integers are passed as-is.
Append to the array stored in a
JSONField.Uses the json_set() function from the json1 extension.
- update(data)¶
- Parameters:
data – a scalar value, list or dictionary to merge with the data currently stored in a
JSONField. To remove a particular key, set that key toNonein the updated data.
Merge new data into the JSON value using the RFC-7396 MergePatch algorithm to apply a patch (
dataparameter) against the column data. MergePatch can add, modify, or delete elements of a JSON object, which meansupdate()is a generalized replacement for bothset()andremove(). MergePatch treats JSON array objects as atomic, soupdate()cannot append to an array, nor modify individual elements of an array.For more information as well as examples, see the SQLite json_patch() function documentation.
- remove()¶
Remove the data stored in the
JSONField.Uses the json_remove function from the json1 extension.
- json_type()¶
Return a string identifying the type of value stored in the column.
The type returned will be one of:
object
array
integer
real
true
false
text
null <– the string “null” means an actual NULL value
NULL <– an actual NULL value means the path was not found
Uses the json_type function from the json1 extension.
- length()¶
Return the length of the array stored in the column.
Uses the json_array_length function from the json1 extension.
- children()¶
The
childrenfunction corresponds tojson_each, a table-valued function that walks the JSON value provided and returns the immediate children of the top-level array or object. If a path is specified, then that path is treated as the top-most element.The rows returned by calls to
children()have the following attributes:key: the key of the current element relative to its parent.value: the value of the current element.type: one of the data-types (seejson_type()).atom: the scalar value for primitive types,NULLfor arrays and objects.id: a unique ID referencing the current node in the tree.parent: the ID of the containing node.fullkey: the full path describing the current element.path: the path to the container of the current row.
Internally this method uses the json_each (documentation link) function from the json1 extension.
Example usage (compare to
tree()method):class KeyData(Model): key = TextField() data = JSONField() KeyData.create(key='a', data={'k1': 'v1', 'x1': {'y1': 'z1'}}) KeyData.create(key='b', data={'x1': {'y1': 'z1', 'y2': 'z2'}}) # We will query the KeyData model for the key and all the # top-level keys and values in it's data field. kd = KeyData.data.children().alias('children') query = (KeyData .select(kd.c.key, kd.c.value, kd.c.fullkey) .from_(KeyData, kd) .order_by(kd.c.key) .tuples()) print(query[:]) # PRINTS: [('a', 'k1', 'v1', '$.k1'), ('a', 'x1', '{"y1":"z1"}', '$.x1'), ('b', 'x1', '{"y1":"z1","y2":"z2"}', '$.x1')]
- tree()¶
The
treefunction corresponds tojson_tree, a table-valued function that recursively walks the JSON value provided and returns information about the keys at each level. If a path is specified, then that path is treated as the top-most element.The rows returned by calls to
tree()have the same attributes as rows returned by calls tochildren():key: the key of the current element relative to its parent.value: the value of the current element.type: one of the data-types (seejson_type()).atom: the scalar value for primitive types,NULLfor arrays and objects.id: a unique ID referencing the current node in the tree.parent: the ID of the containing node.fullkey: the full path describing the current element.path: the path to the container of the current row.
Internally this method uses the json_tree (documentation link) function from the json1 extension.
Example usage:
class KeyData(Model): key = TextField() data = JSONField() KeyData.create(key='a', data={'k1': 'v1', 'x1': {'y1': 'z1'}}) KeyData.create(key='b', data={'x1': {'y1': 'z1', 'y2': 'z2'}}) # We will query the KeyData model for the key and all the # keys and values in it's data field, recursively. kd = KeyData.data.tree().alias('tree') query = (KeyData .select(kd.c.key, kd.c.value, kd.c.fullkey) .from_(KeyData, kd) .order_by(kd.c.key) .tuples()) print(query[:]) # PRINTS: [('a', None, '{"k1":"v1","x1":{"y1":"z1"}}', '$'), ('b', None, '{"x1":{"y1":"z1","y2":"z2"}}', '$'), ('a', 'k1', 'v1', '$.k1'), ('a', 'x1', '{"y1":"z1"}', '$.x1'), ('b', 'x1', '{"y1":"z1","y2":"z2"}', '$.x1'), ('a', 'y1', 'z1', '$.x1.y1'), ('b', 'y1', 'z1', '$.x1.y1'), ('b', 'y2', 'z2', '$.x1.y2')]
- class JSONPath(field, path=None)¶
- Parameters:
field (JSONField) – the field object we intend to access.
path (tuple) – Components comprising the JSON path.
A convenient, Pythonic way of representing JSON paths for use with
JSONField. Implements the same methods asJSONFieldbut designed for operating on nested items, e.g.:Config.create(data={'timeout': 30, 'retries': {'max': 5}}) # Both Config.data['timeout'] and Config.data['retries']['max'] # are instances of JSONPath: query = (Config .select(Config.data['timeout']) .where(Config.data['retries']['max'] < 10))
- class JSONBField(json_dumps=None, json_loads=None, **kwargs)¶
Extends
JSONFieldand stores data in the binaryjsonbformat (SQLite 3.45.0+). When reading raw column values the data is in its encoded binary form use thejson()method to decode:# Raw read returns binary: kv = KV.get(KV.key == 'a') kv.value # b"l'k1'v1" # Use .json() to get a Python object: kv = KV.select(KV.value.json()).get() kv.value # {'k1': 'v1'}
- json()¶
Indicate the JSONB field-data should be deserialized and returned as JSON (as opposed to the SQLite binary format).
Full-Text Search¶
Peewee supports FTS3, FTS4 (legacy, widely available) and FTS5 full-text search extensions.
The general pattern is:
Define a
FTSModelorFTS5Modelsubclass with one or moreSearchFieldcolumns.When a row is created or updated in the source table, insert or update the corresponding row in the search index.
Query the index using
match()and rank results withbm25()(orrank()for FTS5).
Consult the SQLite documentation for FTS query syntax diagrams:
- class SearchField(unindexed=False, column_name=None)¶
Field type for full-text search virtual tables. Raises an exception if constraints (
null=False,unique=True, etc.) are specified, since FTS tables do not support them.Pass
unindexed=Trueto store metadata alongside the search index without indexing it:class DocumentIndex(FTSModel): title = SearchField() content = SearchField() tags = SearchField() timestamp = SearchField(unindexed=True)
- match(term)¶
- Parameters:
term (str) – full-text search query/terms.
- Returns:
a
Expressioncorresponding to theMATCHoperator.
Sqlite’s full-text search supports searching either the full table, including all indexed columns, or searching individual columns. The
match()method can be used to restrict search to a single column:# Search *only* the title field and return results ordered by # relevance, using bm25. query = (DocumentIndex .select(DocumentIndex, DocumentIndex.bm25().alias('score')) .where(DocumentIndex.title.match('python')) .order_by(DocumentIndex.bm25()))
To search all indexed columns, use the
FTSModel.match()method:# Searches *both* the title and body and return results ordered by # relevance, using bm25. query = (DocumentIndex .select(DocumentIndex, DocumentIndex.bm25().alias('score')) .where(DocumentIndex.match('python')) .order_by(DocumentIndex.bm25()))
- highlight(left, right)¶
- Parameters:
left (str) – opening tag for highlight, e.g.
'<b>'right (str) – closing tag for highlight, e.g.
'</b>'
When performing a search using the
MATCHoperator, FTS5 can return text highlighting matches in a given column.# Search for items matching string 'python' and return the title # highlighted with square brackets. query = (SearchIndex .search('python') .select(SearchIndex.title.highlight('[', ']').alias('hi'))) for result in query: print(result.hi) # For example, might print: # Learn [python] the hard way
- snippet(left, right, over_length='...', max_tokens=16)¶
- Parameters:
left (str) – opening tag for highlight, e.g.
'<b>'right (str) – closing tag for highlight, e.g.
'</b>'over_length (str) – text to prepend or append when snippet exceeds the maximum number of tokens.
max_tokens (int) – max tokens returned, must be 1 - 64.
When performing a search using the
MATCHoperator, FTS5 can return text with a snippet containing the highlighted match in a given column.# Search for items matching string 'python' and return the title # highlighted with square brackets. query = (SearchIndex .search('python') .select(SearchIndex.title.snippet('[', ']').alias('snip'))) for result in query: print(result.snip)
FTS4 / FTSModel¶
FTSModel enables Peewee applications to store data in an efficient full-text search index using SQLite FTS4.
FTSModel caveats:
All queries except
MATCHandrowidlookup require a full table scan.Constraints, foreign-keys, and indexes are not supported.
All columns are treated as
TEXT.No built-in ranking. Peewee provides several implementations which can be automatically registered by passing
rank_functions=TruetoSqliteDatabase(...).FTSModel
rowidprimary key may be declared usingRowIDField. Lookups on therowidare very efficient.
Given these constraints all fields besides rowid should be instances of
SearchField to ensure correctness.
Tip
Because of the lack of secondary indexes, it usually makes sense to treat
the FTSModel.rowid primary key as a foreign-key to a row in a normal
SQLite table.
Example:
from peewee import *
from playhouse.sqlite_ext import FTSModel, SearchField
db = SqliteDatabase('app.db', rank_functions=True)
class Document(Model):
# Canonical source of data, stored in a normal table.
author = ForeignKeyField(User, backref='documents')
title = TextField(null=False, unique=True)
content = TextField(null=False)
timestamp = DateTimeField()
class Meta:
database = db
class DocumentIndex(FTSModel):
# Full-text search index.
rowid = RowIDField()
title = SearchField()
content = SearchField()
author = SearchField(unindexed=True)
class Meta:
database = db
# Use the porter stemming algorithm to tokenize content, optimize
# prefix searches of 3 or 4 characters.
options = {'tokenize': 'porter unicode61', 'prefix': [3, 4]}
Store data by inserting it into the FTS table:
# Store a document in the index:
DocumentIndex.create(
rowid=document.id, # Set rowid to match Document's id.
title=document.title,
content=document.content,
author=document.author.get_full_name())
# Equivalent:
(DocumentIndex
.insert({
'rowid': document.id,
'title': document.title,
'content': document.content,
'author': document.author.get_full_name()})
.execute())
FTSModel provides several shortcuts for full-text search queries:
# Simple search using basic ranking algorithm.
results = DocumentIndex.search('python sqlite')
# BM25 search With score and per-column weighting:
results = DocumentIndex.search_bm25(
'python sqlite',
weights={'title': 2.0, 'content': 1.0},
with_score=True,
score_alias='relevance')
for r in results:
print(r.title, r.relevance)
An important method of searching relies on the rowid of the indexed
data matching the document’s canonical id. Using this technique we can
apply additional filters and retrieve the matching Document objects
efficiently:
# Search and ensure we only retrieve articles from the last 30 days.
cutoff = datetime.datetime.now() - datetime.timedelta(days=30)
query = (Document
.select()
.join(
DocumentIndex,
on=(Document.id == DocumentIndex.rowid))
.where(
(Document.timestamp >= cutoff) &
DocumentIndex.match('python sqlite'))
.order_by(DocumentIndex.bm25()))
Warning
All SQL queries on FTSModel classes will be full-table scans
except full-text searches and rowid lookups.
- class FTSModel¶
Base Model class suitable for working with SQLite FTS3 / FTS4.
Supports the following options:
content:Modelcontaining external content, or empty string for “contentless”prefix: integer(s). Ex: ‘2’ or ‘2,3,4’tokenize: simple, porter, unicode61. Ex: ‘porter’
Example:
class DocumentIndex(FTSModel): title = SearchField() body = SearchField() class Meta: database = db options = { 'tokenize': 'porter unicode61', 'prefix': '3', }
- classmethod match(term)¶
- Parameters:
term – Search term or expression. FTS syntax documentation.
Generate a SQL expression representing a search for the given term or expression in the table. SQLite uses the
MATCHoperator to indicate a full-text search.Example:
# Search index for "search phrase" and return results ranked # by relevancy using the BM25 algorithm. query = (DocumentIndex .select() .where(DocumentIndex.match('search phrase')) .order_by(DocumentIndex.bm25())) for result in query: print('Result: %s' % result.title)
- classmethod search(term, weights=None, with_score=False, score_alias='score', explicit_ordering=False)¶
- Parameters:
term – Search term or expression. FTS syntax documentation.
weights – A list of weights for the columns, ordered with respect to the column’s position in the table. Or, a dictionary keyed by the field or field name and mapped to a value.
with_score – Whether the score should be returned as part of the
SELECTstatement.score_alias (str) – Alias to use for the calculated rank score. This is the attribute you will use to access the score if
with_score=True.explicit_ordering (bool) – Order using full SQL function to calculate rank, as opposed to simply referencing the score alias in the ORDER BY clause.
Shorthand way of searching for a term and sorting results by the quality of the match.
This method uses a simplified algorithm for determining the relevance rank of results. For more sophisticated result ranking, use the
search_bm25()method.# Simple search. docs = DocumentIndex.search('search term') for result in docs: print(result.title) # More complete example. docs = DocumentIndex.search( 'search term', weights={'title': 2.0, 'content': 1.0}, with_score=True, score_alias='search_score') for result in docs: print(result.title, result.search_score)
- classmethod search_bm25(term, weights=None, with_score=False, score_alias='score', explicit_ordering=False)¶
- Parameters:
term – Search term or expression. FTS syntax documentation.
weights – A list of weights for the columns, ordered with respect to the column’s position in the table. Or, a dictionary keyed by the field or field name and mapped to a value.
with_score – Whether the score should be returned as part of the
SELECTstatement.score_alias (str) – Alias to use for the calculated rank score. This is the attribute you will use to access the score if
with_score=True.explicit_ordering (bool) – Order using full SQL function to calculate rank, as opposed to simply referencing the score alias in the ORDER BY clause.
Shorthand way of searching for a term and sorting results by the quality of the match using the BM25 algorithm.
Attention
The BM25 ranking algorithm is only available for FTS4. If you are using FTS3, use the
search()method instead.
- classmethod search_bm25f(term, weights=None, with_score=False, score_alias='score', explicit_ordering=False)¶
Same as
FTSModel.search_bm25(), but using the BM25f variant of the BM25 ranking algorithm.
- classmethod search_lucene(term, weights=None, with_score=False, score_alias='score', explicit_ordering=False)¶
Same as
FTSModel.search_bm25(), but using the result ranking algorithm from the Lucene search engine.
- classmethod rank(col1_weight, col2_weight...coln_weight)¶
- Parameters:
col_weight (float) – (Optional) weight to give to the ith column of the model. By default all columns have a weight of
1.0.
Generate an expression that will calculate and return the quality of the search match. This
rankcan be used to sort the search results.The
rankfunction accepts optional parameters that allow you to specify weights for the various columns. If no weights are specified, all columns are considered of equal importance.The algorithm used by
rank()is simple and relatively quick. For more sophisticated result ranking, use:query = (DocumentIndex .select( DocumentIndex, DocumentIndex.rank().alias('score')) .where(DocumentIndex.match('search phrase')) .order_by(DocumentIndex.rank())) for search_result in query: print(search_result.title, search_result.score)
- classmethod bm25(col1_weight, col2_weight...coln_weight)¶
- Parameters:
col_weight (float) – (Optional) weight to give to the ith column of the model. By default all columns have a weight of
1.0.
Generate an expression that will calculate and return the quality of the search match using the BM25 algorithm. This value can be used to sort the search results.
Like
rank(),bm25function accepts optional parameters that allow you to specify weights for the various columns. If no weights are specified, all columns are considered of equal importance.The BM25 result ranking algorithm requires FTS4. If you are using FTS3, use
rank()instead.query = (DocumentIndex .select( DocumentIndex, DocumentIndex.bm25().alias('score')) .where(DocumentIndex.match('search phrase')) .order_by(DocumentIndex.bm25())) for search_result in query: print(search_result.title, search_result.score)
The above code example is equivalent to calling the
search_bm25()method:query = DocumentIndex.search_bm25('search phrase', with_score=True) for search_result in query: print(search_result.title, search_result.score)
- classmethod bm25f(col1_weight, col2_weight...coln_weight)¶
Identical to
bm25(), except that it uses the BM25f variant of the BM25 ranking algorithm.
- classmethod lucene(col1_weight, col2_weight...coln_weight)¶
Identical to
bm25(), except that it uses the Lucene search result ranking algorithm.
- classmethod rebuild()¶
Rebuild the search index. Only valid when the
contentoption was specified (content tables).
- classmethod optimize()¶
Optimize the index.
FTS5 / FTS5Model¶
FTS5Model enables Peewee applications to store data in an efficient full-text search index using SQLite FTS5. FTS5 also comes with native BM25 result ranking.
FTS5Model caveats:
All queries except
MATCHandrowidlookup require a full table scan.Constraints, foreign-keys, and indexes are not supported. All columns must be instances of
SearchField.FTS5Model
rowidprimary key may be declared usingRowIDField. Lookups on therowidare very efficient.
Tip
Because of the lack of secondary indexes, it usually makes sense to treat
the FTS5Model.rowid primary key as a foreign-key to a row in a normal
SQLite table.
Example:
from peewee import *
from playhouse.sqlite_ext import FTS5Model, SearchField
db = SqliteDatabase('app.db')
class Document(Model):
# Canonical source of data, stored in a normal table.
author = ForeignKeyField(User, backref='documents')
title = TextField(null=False, unique=True)
content = TextField(null=False)
timestamp = DateTimeField()
class Meta:
database = db
class DocumentIndex(FTS5Model):
# Full-text search index.
rowid = RowIDField()
title = SearchField()
content = SearchField()
author = SearchField(unindexed=True)
class Meta:
database = db
# Use the porter stemming algorithm and unicode tokenizers,
# and optimize prefix matches of 3 or 4 characters.
options = {'tokenize': 'porter unicode61', 'prefix': [3, 4]}
# Check that FTS5 is available:
if not DocumentIndex.fts5_installed():
raise RuntimeError('FTS5 is not available in this SQLite build.')
Store data by inserting it into the FTS5 table:
# Store a document in the index:
DocumentIndex.create(
rowid=document.id, # Set rowid to match Document's id.
title=document.title,
content=document.content,
author=document.author.get_full_name())
# Equivalent:
(DocumentIndex
.insert({
'rowid': document.id,
'title': document.title,
'content': document.content,
'author': document.author.get_full_name()})
.execute())
FTS5Model provides several shortcuts for full-text search queries:
# Simple search (BM25, ordered by relevance):
results = DocumentIndex.search('python sqlite')
# With score and per-column weighting:
results = DocumentIndex.search(
'python sqlite',
weights={'title': 2.0, 'content': 1.0},
with_score=True,
score_alias='relevance')
for r in results:
print(r.title, r.relevance)
# Highlight matches in the title:
for r in (DocumentIndex.search('python')
.select(DocumentIndex.title.highlight('[', ']').alias('hi'))):
print(r.hi) # e.g. "Learn [python] the hard way"
Tip
An important method of searching relies on the rowid of the indexed
data matching the document’s canonical id. Using this technique we can
apply additional filters and retrieve the matching Document objects
efficiently:
# Search and ensure we only retrieve articles from the last 30 days.
cutoff = datetime.datetime.now() - datetime.timedelta(days=30)
query = (Document
.select()
.join(
DocumentIndex,
on=(Document.id == DocumentIndex.rowid))
.where(
(Document.timestamp >= cutoff) &
DocumentIndex.match('python sqlite'))
.order_by(DocumentIndex.rank()))
If the primary source of the content you are indexing exists in a separate table, you can save some disk space by instructing SQLite to not store an additional copy of the search index content. See External Content for implementation details. The FTS5 documentation has more information.
- class FTS5Model¶
Inherits all
FTSModelmethods plus.Supports the following options:
content:Modelcontaining external content, or empty string for “contentless”content_rowid:Field(external content primary key)prefix: integer(s). Ex: ‘2’ or[2, 3]tokenize: simple, porter, unicode61. Ex: ‘porter unicode61’
Example:
class DocumentIndex(FTS5Model): title = SearchField() body = SearchField() class Meta: database = db options = { 'tokenize': 'porter unicode61', 'prefix': '3', }
- classmethod fts5_installed()¶
Return
Trueif FTS5 is available.
- classmethod match(term)¶
- Parameters:
term – Search term or expression. FTS5 syntax documentation.
Generate a SQL expression representing a search for the given term or expression in the table. SQLite uses the
MATCHoperator to indicate a full-text search.Example:
# Search index for "search phrase" and return results ranked # by relevancy using the BM25 algorithm. query = (DocumentIndex .select() .where(DocumentIndex.match('search phrase')) .order_by(DocumentIndex.rank())) for result in query: print('Result: %s' % result.title)
- classmethod search(term, weights=None, with_score=False, score_alias='score')¶
- Parameters:
term – Search term or expression. FTS5 syntax documentation.
weights – A list of weights for the columns, ordered with respect to the column’s position in the table. Or, a dictionary keyed by the field or field name and mapped to a value.
with_score – Whether the score should be returned as part of the
SELECTstatement.score_alias (str) – Alias to use for the calculated rank score. This is the attribute you will use to access the score if
with_score=True.explicit_ordering (bool) – Order using full SQL function to calculate rank, as opposed to simply referencing the score alias in the ORDER BY clause.
Shorthand way of searching for a term and sorting results by the quality of the match. The
FTS5extension provides a built-in implementation of the BM25 algorithm, which is used to rank the results by relevance.# Simple search. docs = DocumentIndex.search('search term') for result in docs: print(result.title) # More complete example. docs = DocumentIndex.search( 'search term', weights={'title': 2.0, 'content': 1.0}, with_score=True, score_alias='search_score') for result in docs: print(result.title, result.search_score)
- classmethod search_bm25(term, weights=None, with_score=False, score_alias='score')¶
With FTS5,
search_bm25()is identical to thesearch()method.
- classmethod rank(col1_weight, col2_weight...coln_weight)¶
- Parameters:
col_weight (float) – (Optional) weight to give to the ith column of the model. By default all columns have a weight of
1.0.
Generate an expression that will calculate and return the quality of the search match using the BM25 algorithm. This value can be used to sort the search results.
The
rank()function accepts optional parameters that allow you to specify weights for the various columns. If no weights are specified, all columns are considered of equal importance.query = (DocumentIndex .select( DocumentIndex, DocumentIndex.rank().alias('score')) .where(DocumentIndex.match('search phrase')) .order_by(DocumentIndex.rank())) for search_result in query: print(search_result.title, search_result.score)
The above code example is equivalent to calling the
search()method:query = DocumentIndex.search('search phrase', with_score=True) for search_result in query: print(search_result.title, search_result.score)
- classmethod bm25(col1_weight, col2_weight...coln_weight)¶
Because FTS5 provides built-in support for BM25, this method is identical to
rank()method.
- classmethod VocabModel(table_type='row' | 'col' | 'instance', table_name=None)¶
- Parameters:
table_type (str) – Either ‘row’, ‘col’ or ‘instance’.
table_name – Name for the vocab table. If not specified, will be “fts5tablename_v”.
Generate a model class suitable for accessing the vocab table corresponding to FTS5 search index.
- classmethod rebuild()¶
Rebuild the search index. Only valid when the
contentoption was specified (content tables).
- classmethod optimize()¶
Optimize the index.
User-Defined Function Collection¶
The playhouse.sqlite_udf contains a number of functions, aggregates, and
table-valued functions grouped into named collections.
from playhouse.sqlite_udf import register_all, register_groups
from playhouse.sqlite_udf import DATE, STRING
db = SqliteDatabase('my_app.db')
register_all(db) # Register every function.
register_groups(db, DATE, STRING) # Register selected groups.
# Register individual functions:
from playhouse.sqlite_udf import gzip, gunzip
db.register_function(gzip, 'gzip')
db.register_function(gunzip, 'gunzip')
Once registered, call functions via Peewee’s fn namespace or raw SQL:
# Find most common URL hostnames.
query = (Link
.select(fn.hostname(Link.url).alias('host'), fn.COUNT(Link.id))
.group_by(fn.hostname(Link.url))
.order_by(fn.COUNT(Link.id).desc())
.tuples())
Available functions¶
CONTROL_FLOW
- if_then_else(cond, truthy, falsey=None)¶
Simple ternary-type operator, where, depending on the truthiness of the
condparameter, either thetruthyorfalseyvalue will be returned.
DATE
- strip_tz(date_str)¶
- Parameters:
date_str – A datetime, encoded as a string.
- Returns:
The datetime with any timezone info stripped off.
The time is not adjusted in any way, the timezone is simply removed.
- humandelta(nseconds, glue=', ')¶
- Parameters:
nseconds (int) – Number of seconds, total, in timedelta.
glue (str) – Fragment to join values.
- Returns:
Easy-to-read description of timedelta.
Example, 86471 -> “1 day, 1 minute, 11 seconds”
- mintdiff(datetime_value)¶
- Parameters:
datetime_value – A date-time.
- Returns:
Minimum difference between any two values in list.
Aggregate: minimum difference between any two datetimes.
- avgtdiff(datetime_value)¶
- Parameters:
datetime_value – A date-time.
- Returns:
Average difference between values in list.
Aggregate: average difference between consecutive values.
- duration(datetime_value)¶
- Parameters:
datetime_value – A date-time.
- Returns:
Duration from smallest to largest value in list, in seconds.
Aggregate: duration from the smallest to the largest value, in seconds.
- date_series(start, stop, step_seconds=86400)¶
- Parameters:
start (datetime) – Start datetime
stop (datetime) – Stop datetime
step_seconds (int) – Number of seconds comprising a step.
Table-value function: returns rows consisting of the date/+time values encountered iterating from start to stop,
step_secondsat a time.Additionally, if start does not have a time component and step_seconds is greater-than-or-equal-to one day (86400 seconds), the values returned will be dates. Conversely, if start does not have a date component, values will be returned as times. Otherwise values are returned as datetimes.
Example:
SELECT * FROM date_series('2017-01-28', '2017-02-02'); value ----- 2017-01-28 2017-01-29 2017-01-30 2017-01-31 2017-02-01 2017-02-02
FILE
- file_ext(filename)¶
- Parameters:
filename (str) – Filename to extract extension from.
- Returns:
Returns the file extension, including the leading “.”.
- file_read(filename)¶
- Parameters:
filename (str) – Filename to read.
- Returns:
Contents of the file.
HELPER
- gzip(data, compression=9)¶
- Parameters:
data (bytes) – Data to compress.
compression (int) – Compression level (9 is max).
- Returns:
Compressed binary data.
- gunzip(data)¶
- Parameters:
data (bytes) – Compressed data.
- Returns:
Uncompressed binary data.
- hostname(url)¶
- Parameters:
url (str) – URL to extract hostname from.
- Returns:
hostname portion of URL
- toggle(key)¶
- Parameters:
key – Key to toggle.
Toggle a key between True/False state. Example:
>>> toggle('my-key') True >>> toggle('my-key') False >>> toggle('my-key') True
- setting(key, value=None)¶
- Parameters:
key – Key to set/retrieve.
value – Value to set.
- Returns:
Value associated with key.
Store/retrieve a setting in memory and persist during lifetime of application. To get the current value, specify key. To set a new value, call with key and new value.
MATH
- randomrange(start, stop=None, step=None)¶
- Parameters:
start (int) – Start of range (inclusive)
end (int) – End of range(not inclusive)
step (int) – Interval at which to return a value.
Return a random integer between
[start, end).
- gauss_distribution(mean, sigma)¶
- Parameters:
mean (float) – Mean value
sigma (float) – Standard deviation
- sqrt(n)¶
Calculate the square root of
n.
- tonumber(s)¶
- Parameters:
s (str) – String to convert to number.
- Returns:
Integer, floating-point or NULL on failure.
- mode(val)¶
- Parameters:
val – Numbers in list.
- Returns:
The mode, or most-common, number observed.
Aggregate: calculates mode of values.
- minrange(val)¶
- Parameters:
val – Value
- Returns:
Min difference between two values.
Aggregate: minimum distance between two numbers in the sequence.
- avgrange(val)¶
- Parameters:
val – Value
- Returns:
Average difference between values.
Aggregate: average distance between consecutive numbers in the sequence.
- range(val)¶
- Parameters:
val – Value
- Returns:
The range from the smallest to largest value in sequence.
Aggregate: range of values observed.
- median(val)¶
- Parameters:
val – Value
- Returns:
The median, or middle, value in a sequence.
Aggregate: median value of a sequence.
Note
Only available if you compiled the
_sqlite_udfextension.
STRING
- substr_count(haystack, needle)¶
Returns number of times
needleappears inhaystack.
- strip_chars(haystack, chars)¶
Strips any characters in
charsfrom beginning and end ofhaystack.
- damerau_levenshtein_dist(s1, s2)¶
Computes the edit distance from s1 to s2 using the damerau variant of the levenshtein algorithm.
Note
Only available if you compiled the
_sqlite_udfextension.
- levenshtein_dist(s1, s2)¶
Computes the edit distance from s1 to s2 using the levenshtein algorithm.
Note
Only available if you compiled the
_sqlite_udfextension.
- str_dist(s1, s2)¶
Computes the edit distance from s1 to s2 using the standard library SequenceMatcher’s algorithm.
Note
Only available if you compiled the
_sqlite_udfextension.
- regex_search(regex, search_string)¶
- Parameters:
regex (str) – Regular expression
search_string (str) – String to search for instances of regex.
Table-value function: searches a string for substrings that match the provided
regex. Returns rows for each match found.Example:
SELECT * FROM regex_search('\w+', 'extract words, ignore! symbols'); value ----- extract words ignore symbols