We’re hiring an experienced Computer Vision + NLP Engineer to rebuild our entire Korean → English comic translation tool from scratch.
The current system works, but we need a clean, modular, much faster, and more accurate version built on top of Convex instead of Supabase for real-time updates.
You will replicate the existing workflow exactly, and improve it across accuracy, performance, and architecture.
This is a rebuild from scratch.
---
# Current Validated Workflow (What You Will Rebuild and Improve)
Our existing tool processes full chapters with this pipeline:
1. Upload – Chapter images are uploaded.
2. Text Detection – CRAFT generates bounding boxes around text.
3. Text Extraction (OCR) – Gemini 2.5 Pro extracts Korean text inside each bounding box.
4. Panel Detection – OpenCV identifies comic panels in each image.
5. Panel Filtering – Gemini 2.5 Pro removes inaccurate/outlier panels.
6. Alignment – Remaining text boxes are matched to their correct panels.
7. Translation – Gemini 2.5 Pro produces English translations using panel and chapter context.
This workflow is already validated and must behave the same, just faster, cleaner, more accurate, and modular.
---
# Your Job in This Project
Rebuild this entire system from zero with a modern, maintainable architecture that gives us:
Better accuracy
• More precise bounding boxes
• Higher OCR accuracy (including stylized Korean fonts)
• Better panel detection and filtering
• More consistent, human-like translations
Much faster overall performance
• Dramatically reduced processing time per chapter
• Efficient batching and async operations
• Minimal latency from upload to final results
A modular, replaceable architecture
Every step must be isolated behind a clear interface so we can easily swap components:
• Replace CRAFT → PaddleOCR / Donut / Yolov8 detector
• Replace Gemini → GPT or another LLM
• Replace panel detector without touching text logic
• Swap OCR engines freely (Paddle, Donut, TrOCR, GPT fallback)
Modular means no rewrites when upgrading models.
Convex-based backend
• Real-time updates streamed to the frontend
• Job orchestration in Convex
• Stable state management
• Partial outputs instead of waiting for entire chapter completion
---
# What You Must Deliver For The $2,000 Milestone
1. Fully rebuilt pipeline implementing all steps (upload → detection → OCR → panels → alignment → translation).
2. Modular architecture where detection, OCR, panel logic, and translation can be swapped independently.
3. Convex integration for real-time syncing, job progress, and results.
4. Significant accuracy improvements over the current system.
5. Significant performance improvements (faster processing end-to-end).
6. Clean project structure with documentation for all modules and interfaces.
---
# Tech Stack You Will Use
• Python – OCR, detection, panel processing, AI orchestration
• TypeScript – Convex + frontend integration
• Convex – backend database, jobs, and real-time sync
• OCR Tools – CRAFT for text detection
• LLMs – Gemini, GPT
---
# Required Skills
Must have
• Strong OCR experience
• Experience with LLM-based translation/localization
• Python + TypeScript proficiency
• Ability to design clean, modular system architectures
• Experience rebuilding/refactoring complex pipelines
---
# To Apply
Please include:
• Relevant projects (OCR, CV, LLM translation, or modular system rebuilds)
• Examples where you improved accuracy, performance, or architecture
• A short explanation of how you would:
1. Design a modular detection → OCR → panel → translation pipeline
2. Improve bounding boxes and OCR for stylized Korean fonts
3. Integrate Convex for real-time progress streaming to the frontend
Apply Now
Apply Now