What kinds of data fields can it generate?
Common generators cover names, emails, phones, addresses, dates, UUIDs, lorem-ipsum text, integers, floats, booleans, enums, URLs, IP addresses, company names, job titles, and credit card numbers. You configure each field by picking a type and optional constraints like min, max, locale, or a regex pattern. Output is a JSON array of objects matching your schema.
How do I make the output reproducible?
Set a fixed seed. Most generators expose a seed input that initializes the underlying random number generator, so the same seed always produces the same data. This is essential for golden-file tests and for sharing reproducible mock datasets across a team. Without a seed, every run gives different values.
Can I create relationships between fields, like a user and their orders?
Most simple generators emit flat records and do not maintain referential integrity automatically. For relational mocks you have a few options: nest objects inside arrays for one-to-many, generate parents and children separately and join by id in code, or use a more advanced tool like Faker plus a custom script that ties tables together with realistic distributions.
Why does the generated email not match the generated name?
Cheap generators pull each field from independent random pools, so John Doe might end up with email vampire82@example.com. To get coherent records, choose a generator that derives the email from the name, or post-process the output in your own code. The cost is slightly less variety, but realism increases dramatically.
How big a dataset can I generate?
In-browser generators handle up to about 10,000 rows comfortably. Beyond that the JSON string itself becomes unwieldy and the tab may freeze. For larger sets use a CLI tool like jsf or a Node script with Faker that streams output to a file. Test data for load testing usually needs millions of rows, which only a streaming generator can produce efficiently.
Is generated mock data safe to commit to a repo?
Yes - by definition it is fictitious. Avoid committing data generated with realistic seeds that could collide with real customer accounts in your product's namespace, like usernames or order ids. Prefer obviously-fake markers (test+ prefixes, fixed domain names like example.com) to prevent confusion if the dataset ever leaks into production logs.