Field Types#
The following field types are supported:
Class |
DB Type |
Pythonic Type |
Comments |
|---|---|---|---|
StringField |
String |
str |
Encoded as UTF-8 when written to ClickHouse |
FixedStringField |
FixedString |
str |
Encoded as UTF-8 when written to ClickHouse |
DateField |
Date |
datetime.date |
Range 1970-01-01 to 2105-12-31 |
DateTimeField |
DateTime |
datetime.datetime |
Minimal value is 1970-01-01 00:00:00; Timezone aware |
DateTime64Field |
DateTime64 |
datetime.datetime |
Minimal value is 1970-01-01 00:00:00; Timezone aware |
Int8Field |
Int8 |
int |
Range -128 to 127 |
Int16Field |
Int16 |
int |
Range -32768 to 32767 |
Int32Field |
Int32 |
int |
Range -2147483648 to 2147483647 |
Int64Field |
Int64 |
int |
Range -9223372036854775808 to 9223372036854775807 |
UInt8Field |
UInt8 |
int |
Range 0 to 255 |
UInt16Field |
UInt16 |
int |
Range 0 to 65535 |
UInt32Field |
UInt32 |
int |
Range 0 to 4294967295 |
UInt64Field |
UInt64 |
int |
Range 0 to 18446744073709551615 |
Float32Field |
Float32 |
float |
|
Float64Field |
Float64 |
float |
|
DecimalField |
Decimal |
Decimal |
Pythonic values are rounded to fit the scale of the database field |
Decimal32Field |
Decimal32 |
Decimal |
Ditto |
Decimal64Field |
Decimal64 |
Decimal |
Ditto |
Decimal128Field |
Decimal128 |
Decimal |
Ditto |
UUIDField |
UUID |
uuid.UUID |
|
IPv4Field |
IPv4 |
ipaddress.IPv4Address |
|
IPv6Field |
IPv6 |
ipaddress.IPv6Address |
|
Enum8Field |
Enum8 |
Enum |
See below |
Enum16Field |
Enum16 |
Enum |
See below |
ArrayField |
Array |
list |
See below |
TupleField |
Tuple |
tuple |
See below |
PointField |
Point |
contrib.geo.fields.Point |
Experimental feature |
RingField |
Ring |
contrib.geo.fields.Ring |
Experimental feature |
NullableField |
Nullable |
See below |
See below |
DateTimeField and Time Zones#
DateTimeField and DateTime64Field can accept a timezone parameter (either the timezone name or a pytz timezone instance). This timezone will be used as the column timezone in ClickHouse. If not provided, the fields will use the timezone defined in the database configuration.
A DateTimeField and DateTime64Field can be assigned values from one of the following types:
datetime
date
integer - number of seconds since the Unix epoch
float (DateTime64Field only) - number of seconds and microseconds since the Unix epoch
string in
YYYY-MM-DD HH:MM:SSformat or ISO 8601-compatible format
The assigned value always gets converted to a timezone-aware datetime in UTC. The only exception is when the assigned value is a timezone-aware datetime, in which case it will not be changed.
DateTime values that are read from the database are kept in the database-defined timezone - either the one defined for the field, or the global timezone defined in the database configuration.
It is strongly recommended to set the server timezone to UTC and to store all datetime values in that timezone, in order to prevent confusion and subtle bugs. Conversion to a different timezone should only be performed when the value needs to be displayed.
Working with enum fields#
Enum8Field and Enum16Field provide support for working with ClickHouse enum columns. They accept strings or integers as values, and convert them to the matching Pythonic Enum member.
Example of a model with an enum field:
Gender = Enum('Gender', 'male female unspecified')
class Person(Model):
first_name = StringField()
last_name = StringField()
birthday = DateField()
gender = Enum32Field(Gender)
engine = MergeTree('birthday', ('first_name', 'last_name', 'birthday'))
suzy = Person(first_name='Suzy', last_name='Jones', gender=Gender.female)
Working with array fields#
You can create array fields containing any data type, for example:
class SensorData(Model):
date = DateField()
temperatures = ArrayField(Float32Field())
humidity_levels = ArrayField(UInt8Field())
engine = MergeTree('date', ('date',))
data = SensorData(date=date.today(), temperatures=[25.5, 31.2, 28.7], humidity_levels=[41, 39, 66])
Note that multidimensional arrays are not supported yet by the ORM.
Working with tuple fields#
You can create tuple fields containing multiple data type, for example:
from datetime import date
from clickhouse_orm.models import Model
from clickhouse_orm.engines import MergeTree
from clickhouse_orm.fields import DateField, Float32Field, UInt8Field, TupleField
class SensorData(Model):
date = DateField()
info = TupleField([('t', Float32Field()), ('h', UInt8Field())])
engine = MergeTree('date', ('date',))
data = SensorData(date=date.today(), info=(25.5, 41))
print(data.info) # TupleField(t=25.5, h=41) <class 'clickhouse_orm.fields.TupleField'>
print(data.info.t) # 25.5
print(data.info.h) # 41
TupleField uses namedtuple to store data, so you can access values as properties.
Working with nullable fields#
ClickHouse provides a NULL value support.
Wrapping another field in a NullableField makes it possible to assign None to that field. For example:
class EventData(Model):
date = DateField()
comment = NullableField(StringField(), extra_null_values={''})
score = NullableField(UInt8Field())
serie = NullableField(ArrayField(UInt8Field()))
engine = MergeTree('date', ('date',))
score_event = EventData(date=date.today(), comment=None, score=5, serie=None)
comment_event = EventData(date=date.today(), comment='Excellent!', score=None, serie=None)
another_event = EventData(date=date.today(), comment='', score=None, serie=None)
action_event = EventData(date=date.today(), comment='', score=None, serie=[1, 2, 3])
The extra_null_values parameter is an iterable of additional values that should be converted
to None.
NOTE: ArrayField of NullableField is not supported. Also EnumField cannot be nullable.
NOTE: Using Nullable almost always negatively affects performance, keep this in mind when designing your databases.
Working with LowCardinality fields#
Starting with version 19.0 ClickHouse offers a new type of field to improve the performance of queries and compaction of columns for low entropy data.
More specifically LowCardinality data type builds dictionaries automatically. It can use multiple different dictionaries if necessarily. If the number of distinct values is pretty large, the dictionaries become local, several different dictionaries will be used for different ranges of data. For example, if you have too many distinct values in total, but only less than about a million values each day - then the queries by day will be processed efficiently, and queries for larger ranges will be processed rather efficiently.
LowCardinality works independently of (generic) fields compression. LowCardinality fields are subsequently compressed as usual. The compression ratios of LowCardinality fields for text data may be significantly better than without LowCardinality.
LowCardinality will give performance boost, in the form of processing speed, if the number of distinct values is less than a few millions. This is because data is processed in dictionary encoded form.
You can find further information here.
Usage example:
class LowCardinalityModel(Model):
date = DateField()
string = LowCardinalityField(StringField())
nullable = LowCardinalityField(NullableField(StringField()))
array = ArrayField(LowCardinalityField(DateField()))
...
Note: LowCardinality field with an inner array field is not supported. Use an ArrayField with a LowCardinality inner field as seen in the example.
Working with geo fields#
PointField and RingField are an experimental feature and their API may change significantly in the future.
import uuid
from clickhouse_orm import F
from clickhouse_orm.contrib.geo import PointField, Point
from clickhouse_orm.models import Model
from clickhouse_orm.engines import MergeTree
from clickhouse_orm.fields import UUIDField
class Residence(Model):
uuid = UUIDField()
geo_wgs84 = PointField()
engine = MergeTree('uuid')
r = Residence(uuid=uuid.uuid4(), geo_wgs84=Point(120, 30))
print(r.geo_wgs84)
# <Point x=120.0 y=30.0>
print(r.geo_wgs84.x)
# 120.0
print(r.geo_wgs84.y)
# 30.0
f = F.geohashEncode(
F.tupleElement(Residence.geo_wgs84, 1), F.tupleElement(Residence.geo_wgs84, 2), 7
)
print(f.to_sql())
# geohashEncode(tupleElement(`geo_wgs84`, 1), tupleElement(`geo_wgs84`, 2), 7)
Creating custom field types#
Sometimes it is convenient to use data types that are supported in Python, but have no corresponding column type in ClickHouse. In these cases it is possible to define a custom field class that knows how to convert the Pythonic object to a suitable representation in the database, and vice versa.
For example, we can create a BooleanField which will hold True and False values, but write them to the database as 0 and 1 (in a UInt8 column). For this purpose we’ll subclass the Field class, and implement two methods:
to_pythonwhich converts any supported value to abool. The method should know how to handle strings (which typically come from the database), booleans, and possibly other valid options. In case the value is not supported, it should raise aValueError.to_db_stringwhich converts aboolinto a string for writing to the database.
Here’s the full implementation:
from clickhouse_orm import Field
class BooleanField(Field):
# The ClickHouse column type to use
db_type = 'UInt8'
# The default value
class_default = False
def to_python(self, value, timezone_in_use):
# Convert valid values to bool
if value in (1, '1', True):
return True
elif value in (0, '0', False):
return False
else:
raise ValueError('Invalid value for BooleanField: %r' % value)
def to_db_string(self, value, quote=True):
# The value was already converted by to_python, so it's a bool
return '1' if value else '0'