Prompt Engineer
Syntas is looking for a Prompt Engineer to specialize in the art and science of getting the best possible outputs from large language models. You will design prompting strategies, build evaluation frameworks, implement systematic prompt optimization processes, and help our clients extract maximum value from their AI investments through better instructions, examples, and system architectures. This role sits at the intersection of linguistics, psychology, and software engineering. You will develop deep expertise in how different models respond to various prompting techniques, create reusable patterns that can be adapted across use cases, and build the tooling and processes that make prompt development more scientific and less guesswork.
About the Role
As a Prompt Engineer at Syntas, you will be the specialist who makes AI systems actually work well in production. While many engineers can get a demo working with a simple prompt, you understand the gap between demo and production—the edge cases, the failure modes, the consistency requirements—and you know how to close that gap through systematic prompt engineering.
Your work begins with understanding what the AI system needs to accomplish and how success will be measured. You will analyze the types of inputs the system will receive, the variations and edge cases it must handle, and the quality standards outputs must meet. From this understanding, you will design prompting strategies that might include few-shot examples, chain-of-thought reasoning, structured output formats, self-consistency techniques, or multi-step workflows with verification.
Evaluation is central to your role. You will build frameworks that measure prompt performance across relevant dimensions—accuracy, consistency, latency, cost, safety—and use these measurements to systematically improve. You will implement A/B testing infrastructure, create golden datasets for regression testing, and design LLM-as-judge evaluation chains that can assess quality at scale. Your approach to prompt engineering is empirical: hypotheses are tested, results are measured, and iterations are driven by data.
You will also work on the tooling and infrastructure that supports prompt engineering at scale. This includes prompt versioning systems, experiment tracking, production monitoring for prompt performance, and documentation frameworks that capture the reasoning behind prompt design decisions. You will create processes that let teams iterate on prompts safely, with appropriate testing and rollback capabilities.
What You Will Build
- 1Prompting strategies for complex use cases including multi-step reasoning, tool use, and agentic workflows
- 2Evaluation frameworks that measure prompt quality across accuracy, consistency, latency, and cost dimensions
- 3Few-shot example libraries and retrieval systems that dynamically select relevant examples for each query
- 4Prompt versioning and experimentation infrastructure using tools like Langfuse for systematic optimization
- 5LLM-as-judge evaluation chains for automated quality assessment at scale
- 6Production monitoring systems that track prompt performance and surface degradation or drift
- 7Documentation and best practices guides that codify effective prompting patterns for reuse
Key Responsibilities
- Design prompting strategies for diverse client use cases across industries and applications
- Build evaluation frameworks that measure prompt performance against defined quality criteria
- Implement systematic prompt optimization using A/B testing, ablation studies, and iterative refinement
- Create few-shot example libraries and develop strategies for example selection and retrieval
- Develop structured output prompts using JSON mode, function calling, and constrained generation
- Design chain-of-thought and multi-step prompting strategies for complex reasoning tasks
- Implement prompt versioning, experiment tracking, and production deployment workflows
- Build LLM-as-judge evaluation systems for scalable quality assessment
- Analyze prompt failures and edge cases to identify improvement opportunities
- Collaborate with engineering teams to integrate prompting best practices into application development
- Document prompting patterns, evaluation methodologies, and lessons learned for knowledge sharing
- Stay current with prompting research, new model capabilities, and emerging techniques
- Train client teams on prompt engineering best practices and evaluation methods
What We Are Looking For
- 3+ years of experience working with LLMs, with at least 1 year focused specifically on prompt engineering
- Deep understanding of prompting techniques: few-shot learning, chain-of-thought, self-consistency, and structured outputs
- Experience with prompt evaluation methods including human evaluation, automated metrics, and LLM-as-judge
- Strong analytical skills with ability to design experiments and interpret results systematically
- Proficiency in Python for building evaluation frameworks, data processing, and automation
- Experience with LLM observability tools (Langfuse, LangSmith, Weights & Biases)
- Understanding of different model capabilities: GPT-4, Claude, Llama, Mistral, and their strengths/weaknesses
- Familiarity with advanced techniques: RAG integration, tool use, function calling, and agentic patterns
- Excellent written communication for crafting prompts and documenting methodologies
- Strong attention to detail—prompt engineering often comes down to precise wording choices
- Consultative mindset with ability to understand client requirements and translate to prompting strategies
- Self-directed work style with ability to drive projects independently in a remote environment
Nice to Have
- Background in linguistics, cognitive science, or technical writing
- Experience with model fine-tuning and understanding when to fine-tune vs. prompt
- Knowledge of prompt injection vulnerabilities and defensive prompting techniques
- Experience with constitutional AI and RLHF concepts
- Background in specific verticals: legal, medical, financial where precision is critical
- Experience with multimodal prompting (vision-language models)
- Familiarity with prompt optimization tools and automatic prompt generation
- Knowledge of token economics and cost optimization strategies
- Experience building prompt management systems or platforms
- Prior experience teaching or training others on prompt engineering
- Public writing or speaking on prompt engineering topics
- Contributions to open source prompt libraries or frameworks
Tech Stack
Benefits & Perks
- Competitive salary: $130,000 - $190,000 depending on experience
- Equity participation with meaningful upside as we grow
- Fully remote work with flexible hours—work from anywhere in the US
- Comprehensive health, dental, and vision insurance (100% premium covered for employee)
- Unlimited PTO with encouraged minimum of 4 weeks—we mean it
- $3,000 annual learning and development budget for courses, books, and certifications
- Conference attendance budget including travel—attend or speak at AI conferences
- Top-tier hardware: MacBook Pro, external display, and peripherals of your choice
- All AI tools and subscriptions you need: GPT-4, Claude, GitHub Copilot, and more
- Quarterly team offsites in interesting locations
- 401(k) with company match
- Paid parental leave (12 weeks)
- Home office setup stipend ($1,000)
- Work on genuinely interesting problems across diverse industries
Ready to Apply?
Send us your resume and a brief introduction. Tell us about your experience with AI/ML systems and what excites you about this opportunity.
