Preparing fake data for tests is its own kind of overhead.
To test a user system, you need to build a User object β fill in id, email, name, created_at. If you’re testing orders, you need an Order, which contains a list of OrderItem. Hand-crafting all of that often takes longer than writing the actual test logic.
polyfactory solves this: give it a class with type hints, and it generates conforming fake data automatically.
Install
1
| pip install polyfactory
|
With Pydantic:
1
| pip install polyfactory pydantic
|
Basic Usage: dataclass
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| from dataclasses import dataclass
from polyfactory.factories import DataclassFactory
@dataclass
class User:
id: int
name: str
email: str
is_active: bool
class UserFactory(DataclassFactory):
__model__ = User
user = UserFactory.build()
# User(id=42, name='vDjhqXt', email='KpLmn@example.com', is_active=True)
|
Every build() call produces different values β id is a random int, email is a random string, is_active is random True/False.
Pydantic v2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| from pydantic import BaseModel
from polyfactory.factories.pydantic_factory import ModelFactory
class Order(BaseModel):
id: int
amount: float
status: str
items: list[str]
class OrderFactory(ModelFactory):
__model__ = Order
order = OrderFactory.build()
# Order(id=7, amount=3.14, status='aBcD', items=['x', 'y'])
|
Pydantic validators still run β polyfactory won’t generate data that fails validation.
Overriding Specific Fields
Most of the time you only care about a few fields β just pass them in:
1
2
3
4
5
6
7
| # only set status, everything else is auto-filled
order = OrderFactory.build(status="paid")
# or set defaults in the factory class
class PaidOrderFactory(OrderFactory):
status = "paid"
amount = 100.0
|
This is the pattern I use most: base factory handles the bulk of the data, override the fields that matter for the specific test. No need to hardcode the whole object every time.
Batch Generation
1
2
| users = UserFactory.batch(10)
# gives you 10 different User objects
|
Useful when testing a list or pagination logic.
Combining With pytest Fixtures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| # conftest.py
import pytest
from polyfactory.factories import DataclassFactory
@pytest.fixture
def user():
return UserFactory.build()
@pytest.fixture
def active_user():
return UserFactory.build(is_active=True)
@pytest.fixture
def users():
return UserFactory.batch(5)
|
Fixtures return factory-generated objects. The test doesn’t need to know what fields User has β only the fields actually relevant to that test need to be specified.
1
2
3
4
5
6
7
| def test_deactivate_user(active_user):
deactivate(active_user)
assert not active_user.is_active
def test_list_users(users):
result = get_user_list(users)
assert len(result) == 5
|
TypedDict and attrs
1
2
3
4
5
6
7
8
9
10
11
12
13
| from typing import TypedDict
from polyfactory.factories import TypedDictFactory
class Config(TypedDict):
host: str
port: int
debug: bool
class ConfigFactory(TypedDictFactory):
__model__ = Config
config = ConfigFactory.build()
# {'host': 'abc', 'port': 8080, 'debug': False}
|
Nested Objects
polyfactory handles nested types recursively:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| @dataclass
class Address:
city: str
country: str
@dataclass
class User:
name: str
address: Address # nested, auto-generated
class UserFactory(DataclassFactory):
__model__ = User
user = UserFactory.build()
# user.address is also auto-generated
|
No need to build AddressFactory separately and pass it in.
polyfactory vs Faker
Faker also generates fake data, but you tell it what you want:
1
2
3
4
| from faker import Faker
fake = Faker()
email = fake.email()
name = fake.name()
|
polyfactory is “give me a class, I’ll fill it”:
1
| user = UserFactory.build()
|
They’re not competing β they serve different needs:
- Faker: You care what each field looks like (realistic emails, real city names, etc.)
- polyfactory: You just need the types to be correct, the specific values don’t matter
Test logic usually doesn’t care if an email looks real β it just needs to be a string. For that case, polyfactory is simpler.
Summary
polyfactory’s core idea is one thing: types are the specification, factories generate data according to types.
Combined with pytest fixtures, getting test data down to one or two lines is straightforward β you can focus on what the test is actually checking.
References