How to Generate Sample Data in PostgreSQL (The Easy Way)

Every developer reaches a point where they need fake data — lots of it. To test performance. To try out queries. To build demos. To break things in a safe environment.

The good news: PostgreSQL is really good at generating sample data. You don’t need Python, Excel, or a giant CSV — you can do everything right inside SQL.

1. Generate Rows Using generate_series

The simplest way to create sample data is with PostgreSQL’s built-in generate_series function. It produces rows of numbers (or timestamps), which you can turn into anything you want.

Create 1,000 fake users:


CREATE TABLE demo_user (
  user_id     INT PRIMARY KEY,
  first_name  TEXT,
  last_name   TEXT,
  created_at  TIMESTAMP
);

INSERT INTO demo_user (user_id, first_name, last_name, created_at)
SELECT
  id,
  'User' || id,
  'Test' || id,
  NOW() - (id || ' days')::interval
FROM generate_series(1, 1000) AS id;

In seconds, you have a table full of predictable, structured sample data.

2. Add Random Values with random()

PostgreSQL includes a built-in random number generator. Combine it with generate_series and you can create sample datasets that feel real.

Create 5,000 random payments:


CREATE TABLE demo_payment (
  payment_id  SERIAL PRIMARY KEY,
  user_id     INT,
  amount      NUMERIC(5,2),
  paid_at     TIMESTAMP
);

INSERT INTO demo_payment (user_id, amount, paid_at)
SELECT
  (random() * 1000)::int + 1,            -- random user_id 1–1001
  round((random() * 50)::numeric, 2),    -- $0.00 – $50.00
  NOW() - (random() * 30 || ' days')::interval
FROM generate_series(1, 5000);

Randomized data is perfect for testing indexes, filtering, aggregates, and performance.

3. Generate Text, Emails, and More

With a little creativity, you can generate realistic strings — emails, phone numbers, cities, anything.


CREATE TABLE demo_customer (
  customer_id SERIAL PRIMARY KEY,
  full_name   TEXT,
  email       TEXT,
  city        TEXT
);

INSERT INTO demo_customer (full_name, email, city)
SELECT
  'Customer ' || id,
  'customer' || id || '@example.com',
  (ARRAY['Boston','Chicago','Dallas','Miami','Seattle'])[floor(random()*5)+1]
FROM generate_series(1, 500) AS id;

Arrays + random() give you instant “pick one value from this list” behavior.

4. Generate Timestamp Ranges

Need a table full of hourly, daily, or minute-by-minute timestamps? generate_series can do that too.

Generate 24 hours of timestamps:


SELECT generate_series(
  NOW() - interval '24 hours',
  NOW(),
  interval '1 hour'
) AS hour_bucket;

This is incredibly useful for analytics, reporting, and time-based testing.

5. Bulk-Load Data into Existing Tables (Pagila Example)

If you want to expand the Pagila database with synthetic rentals, here’s a simple pattern.


INSERT INTO rental (rental_date, inventory_id, customer_id, staff_id)
SELECT
  NOW() - (random() * 10 || ' days')::interval,
  (SELECT inventory_id FROM inventory ORDER BY random() LIMIT 1),
  (SELECT customer_id  FROM customer  ORDER BY random() LIMIT 1),
  1
FROM generate_series(1, 2000);

This instantly adds 2,000 realistic rentals spread across random inventory and customers.

6. Bonus: Generate Data with Correlated Fields

Sometimes “fully random” isn’t realistic enough. You might want higher spend for some users, or customers who rent more often.

Example: 20% of users spend more money.


INSERT INTO demo_payment (user_id, amount, paid_at)
SELECT
  id,
  CASE
    WHEN random() < 0.2 THEN round((random() * 200)::numeric, 2) -- bigger spenders
    ELSE round((random() * 50)::numeric, 2)
  END,
  NOW() - (random() * 10 || ' days')::interval
FROM generate_series(1, 1000) AS id;

Suddenly your data has patterns — perfect for analytics demos, machine learning practice, or dashboard testing!

Conclusion

PostgreSQL makes generating sample data fast and fun. Whether you need predictable datasets, random values, realistic user data, or time-series ranges, you can do it all with a few lines of SQL.

Use these patterns whenever you need to prototype, test, or teach — no CSV imports required. And once you master generate_series + random(), you can model almost anything!