Menu
Projects Services Testimonials About Blogs
← Back to Archive
Automated Dataset Engineering

Synthetic Resume Data Factory

n8nLlama 3.1Data ScienceAutomation
Synthetic Resume Data Factory

The Core Concept

The Automated Resume Data Factory addresses a critical bottleneck in Machine Learning: the scarcity of high-quality, structured personal data for training HR-tech models. Privacy laws (GDPR/NPC) make using real resumes risky for development; this system solves that by generating synthetic yet realistic professional profiles at scale.

By orchestrating LLMs for persona generation and lightning-fast image models for headshots, I built a system that transforms a simple prompt into a fully formatted, “hirable” professional identity—exported as both machine-readable JSON and human-readable PDF.


The Engineering Challenge

The Data Scarcity Problem

Developers building ATS (Applicant Tracking Systems) or career recommendation engines often struggle to find diverse datasets. Manually creating thousands of unique resumes with consistent work histories, skills, and contact info is practically impossible.

The Solution: Procedural Persona Generation

I engineered a multi-stage workflow that ensures every generated resume is internally consistent. If the AI decides a persona is a “Senior DevOps Engineer,” the system automatically aligns their:

  1. Work History: Logic-gated career progression (Junior > Mid > Senior).
  2. Skill Clusters: Tech stacks that actually make sense together (e.g., Kubernetes + Terraform).
  3. Visual Identity: Generating a professional headshot that matches the persona’s age and vibe.

Technical Architecture

The workflow is managed via n8n, connecting local inference engines with cloud storage for a seamless “set and forget” experience.

LayerTechnologyRole
Orchestrationn8nManages the logic flow and conditional branching.
Text GenerationLlama 3.1Generates professional summaries and work experiences.
Image SynthesisZ Image TurboProduces high-speed, hyper-realistic professional portraits.
Document EngineReactive ResumeRenders structured data into polished, modern PDF layouts.
StorageGoogle Drive APIAutomatically categorizes and stores PDF/JSON exports.

Key Features

1. High-Fidelity Synthetic Personas

Unlike simple “lorem ipsum” generators, this pipeline uses Llama 3.1 to craft nuanced professional narratives. It simulates realistic career gaps, diverse educational backgrounds, and industry-specific jargon, making the data indistinguishable from real-world resumes for testing purposes.

2. Turbo-Charged Visuals

By integrating Z Image Turbo, the system generates professional headshots in milliseconds. This ensures that every resume in the dataset has a unique visual identity without slowing down the entire automation chain.

3. Structured Data Export (JSON + PDF)

The system doesn’t just “print” a resume. It simultaneously generates:

  • JSON Files: Perfect for training NER (Named Entity Recognition) models.
  • PDF Files: Ideal for testing PDF parsers and OCR (Optical Character Recognition) tools.

The Impact

This automation transforms a week-long data entry task into a 5-minute background process:

  • Massive Scalability: Generate 10 or 1,000 resumes with the same amount of effort.
  • Privacy Compliance: 100% synthetic data means zero risk of PII (Personally Identifiable Information) leaks.
  • Format Versatility: Ready-to-use data for both data scientists and front-end developers.

Lessons Learned & Future Roadmap

Developing this project highlighted the power of API-first design. By connecting Reactive Resume’s templating engine with n8n’s logic, I realized that any structured data can be turned into a professional-grade document with zero manual intervention.

What’s next?

  • Short-Term: Adding support for multi-language resume generation (Tagalog/English/Spanish).
  • Long-Term: Integrating an automated “LinkedIn Profile” generator to create holistic online footprints for synthetic personas.

How to Run This Project

Ready to generate your own dataset?

  1. Prerequisites:
    • An active n8n instance (Docker recommended).
    • API access to Llama 3.1 and Z Image Turbo.
    • A Reactive Resume server instance.
  2. Clone the Repo:
    git clone [https://github.com/jerohalili/resume-ai-generator-automation.git](https://github.com/jerohalili/resume-ai-generator-automation.git)
  3. Run ComfyUI: Link the Custom Model Workflows API to n8n
  4. Run AnythingLLM: Link the Custom Model LLM API to n8n
  5. Ready n8n: Import then link the n8n Workflow to Telegram
  6. Run n8n: Customize the n8n Workflow with prompts and parameters
  7. Use Telegram: Publish the n8n Workflow and use online in Telegram

Let's build
Something Big

Currently available for specialized MERN systems or AI automation consulting.

© 2026 Jero Halili · Built with Astro & Tailwind