# JobForge - Architecture Guide **Version:** 1.0.0 **Status:** Production-Ready Implementation **Date:** July 2025 **Target Market:** Canadian Job Market Applications **Tagline:** "Forge Your Path to Success" --- ## 📋 Executive Summary ### Project Vision JobForge is a comprehensive, AI-powered job application management system that streamlines the entire application process through intelligent automation, multi-resume optimization, and authentic voice preservation for professional job seekers in the Canadian market. ### Core Objectives - **Workflow Automation**: 3-phase intelligent application pipeline (Research → Resume → Cover Letter) - **Multi-Resume Intelligence**: Leverage multiple resume versions as focused expertise lenses - **Authentic Voice Preservation**: Maintain candidate's proven successful writing patterns - **Canadian Market Focus**: Optimize for Canadian business culture and application standards - **Local File Management**: Complete control over sensitive career documents - **Scalable Architecture**: Support for high-volume job application campaigns ### Business Value Proposition - **40% Time Reduction**: Automated research and document generation - **Higher Success Rates**: Strategic positioning based on comprehensive analysis - **Consistent Quality**: Standardized excellence across all applications - **Document Security**: Local storage with full user control - **Career Intelligence**: Build knowledge base from successful applications --- ## 🏗️ High-Level Architecture ### System Overview ```mermaid graph TB subgraph "User Interface Layer" A[Streamlit Web UI] B[Configuration Panel] C[File Management UI] D[Workflow Interface] end subgraph "Application Core" E[Application Engine] F[Phase Orchestrator] G[State Manager] H[File Controller] end subgraph "AI Processing Layer" I[Research Agent] J[Resume Optimizer] K[Cover Letter Generator] L[Claude API Client] end subgraph "Data Management" M[Resume Repository] N[Reference Database] O[Application Store] P[Status Tracker] end subgraph "Storage Layer" Q[Local File System] R[Project Structure] S[Document Templates] end subgraph "External Services" T[Claude AI API] U[Web Search APIs] V[Company Intelligence] end A --> E B --> H C --> H D --> F E --> I E --> J E --> K F --> G I --> L J --> L K --> L L --> T M --> Q N --> Q O --> Q P --> Q H --> R I --> U I --> V ``` ### Architecture Principles #### **1. Domain-Driven Design** - Clear separation between job application domain logic and technical infrastructure - Rich domain models representing real-world career management concepts - Business rules encapsulated within domain entities #### **2. Event-Driven Workflow** - Each phase triggers the next through well-defined events - State transitions logged for auditability and recovery - Asynchronous processing with real-time UI updates #### **3. Multi-Source Intelligence** - Resume portfolio treated as complementary expertise views - Reference database provides voice pattern templates - Company research aggregated from multiple sources #### **4. Security-First Design** - All sensitive career data stored locally - No cloud storage of personal information - API keys managed through secure environment variables --- ## 🔧 Core Components ### **Application Engine** ```python class JobApplicationEngine: """Central orchestrator for the entire application workflow""" def __init__(self, config: EngineConfig, file_manager: FileManager): self.config = config self.file_manager = file_manager self.phase_orchestrator = PhaseOrchestrator() self.state_manager = StateManager() # Core workflow methods def create_application(self, job_data: JobData) -> Application def execute_research_phase(self, app_id: str) -> ResearchReport def optimize_resume(self, app_id: str, research: ResearchReport) -> OptimizedResume def generate_cover_letter(self, app_id: str, context: ApplicationContext) -> CoverLetter # Management operations def list_applications(self, status_filter: str = None) -> List[Application] def update_application_status(self, app_id: str, status: ApplicationStatus) -> None def export_application(self, app_id: str, format: ExportFormat) -> str ``` **Responsibilities:** - Coordinate all application lifecycle operations - Manage state transitions between phases - Integrate with AI processing agents - Handle file system operations through delegates ### **Phase Orchestrator** ```python class PhaseOrchestrator: """Manages the 3-phase workflow execution and state transitions""" class Phases(Enum): INPUT = "input" RESEARCH = "research" RESUME = "resume" COVER_LETTER = "cover_letter" COMPLETE = "complete" def execute_phase(self, phase: Phases, context: PhaseContext) -> PhaseResult def can_advance_to(self, target_phase: Phases, current_state: ApplicationState) -> bool def get_phase_requirements(self, phase: Phases) -> List[Requirement] # Phase-specific execution async def execute_research(self, job_data: JobData, resume_portfolio: List[Resume]) -> ResearchReport async def execute_resume_optimization(self, research: ResearchReport, portfolio: ResumePortfolio) -> OptimizedResume async def execute_cover_letter_generation(self, context: ApplicationContext) -> CoverLetter ``` **Design Features:** - State machine implementation for workflow control - Async execution with progress callbacks - Dependency validation between phases - Rollback capability for failed phases ### **AI Processing Agents** #### **Research Agent** ```python class ResearchAgent: """Phase 1: Comprehensive job description analysis and strategic positioning""" def __init__(self, claude_client: ClaudeAPIClient, web_search: WebSearchClient): self.claude = claude_client self.web_search = web_search async def analyze_job_description(self, job_desc: str) -> JobAnalysis: """Extract and categorize job requirements, company info, and keywords""" async def assess_candidate_fit(self, job_analysis: JobAnalysis, resume_portfolio: ResumePortfolio) -> FitAssessment: """Multi-resume skills assessment with transferability analysis""" async def research_company_intelligence(self, company_name: str) -> CompanyIntelligence: """Gather company culture, recent news, and strategic insights""" async def generate_strategic_positioning(self, context: ResearchContext) -> StrategicPositioning: """Determine optimal candidate positioning and competitive advantages""" ``` #### **Resume Optimizer** ```python class ResumeOptimizer: """Phase 2: Multi-resume synthesis and strategic optimization""" def __init__(self, claude_client: ClaudeAPIClient, config: OptimizationConfig): self.claude = claude_client self.config = config # 600-word limit, formatting rules, etc. async def synthesize_resume_portfolio(self, portfolio: ResumePortfolio, research: ResearchReport) -> SynthesizedContent: """Merge insights from multiple resume versions""" async def optimize_for_job(self, content: SynthesizedContent, positioning: StrategicPositioning) -> OptimizedResume: """Create targeted resume within word limits""" def validate_optimization(self, resume: OptimizedResume) -> OptimizationReport: """Ensure word count, keyword density, and strategic alignment""" ``` #### **Cover Letter Generator** ```python class CoverLetterGenerator: """Phase 3: Authentic voice preservation and company-specific customization""" def __init__(self, claude_client: ClaudeAPIClient, reference_db: ReferenceDatabase): self.claude = claude_client self.reference_db = reference_db async def analyze_voice_patterns(self, selected_references: List[CoverLetterReference]) -> VoiceProfile: """Extract authentic writing style, tone, and structural patterns""" async def generate_cover_letter(self, context: CoverLetterContext, voice_profile: VoiceProfile) -> CoverLetter: """Create authentic cover letter using proven voice patterns""" def validate_authenticity(self, cover_letter: CoverLetter, voice_profile: VoiceProfile) -> AuthenticityScore: """Ensure generated content matches authentic voice patterns""" ``` ### **Data Models** ```python class Application(BaseModel): """Core application entity with full lifecycle management""" id: str name: str # company_role_YYYY_MM_DD format status: ApplicationStatus created_at: datetime updated_at: datetime # Job information job_data: JobData company_info: CompanyInfo # Phase results research_report: Optional[ResearchReport] = None optimized_resume: Optional[OptimizedResume] = None cover_letter: Optional[CoverLetter] = None # Metadata priority_level: PriorityLevel application_deadline: Optional[date] = None # Business logic @property def completion_percentage(self) -> float def can_advance_to_phase(self, phase: PhaseOrchestrator.Phases) -> bool def export_to_format(self, format: ExportFormat) -> str class ResumePortfolio(BaseModel): """Collection of focused resume versions representing different expertise areas""" resumes: List[Resume] def get_technical_focused(self) -> List[Resume] def get_management_focused(self) -> List[Resume] def get_industry_specific(self, industry: str) -> List[Resume] def synthesize_skills(self) -> SkillMatrix class JobData(BaseModel): """Comprehensive job posting information""" job_url: Optional[str] = None job_description: str company_name: str role_title: str location: str priority_level: PriorityLevel how_found: str application_deadline: Optional[date] = None # Additional context specific_aspects: Optional[str] = None company_insights: Optional[str] = None special_considerations: Optional[str] = None ``` --- ## 📊 Data Flow Architecture ### Application Creation Flow ```mermaid sequenceDiagram participant UI as Streamlit UI participant Engine as Application Engine participant FileManager as File Manager participant Storage as Local Storage UI->>Engine: create_application(job_data) Engine->>Engine: validate_job_data() Engine->>Engine: generate_application_name() Engine->>FileManager: create_application_folder() FileManager->>Storage: mkdir(company_role_date) FileManager->>Storage: save(user_inputs.json) FileManager->>Storage: save(original_job_description.md) FileManager->>Storage: save(application_status.json) Engine-->>UI: Application(id, status=created) ``` ### 3-Phase Workflow Execution ```mermaid flowchart TD A[Application Created] --> B[Phase 1: Research] B --> C{Research Complete?} C -->|Yes| D[Phase 2: Resume] C -->|No| E[Research Error] D --> F{Resume Complete?} F -->|Yes| G[Phase 3: Cover Letter] F -->|No| H[Resume Error] G --> I{Cover Letter Complete?} I -->|Yes| J[Application Complete] I -->|No| K[Cover Letter Error] E --> L[Log Error & Retry] H --> L K --> L L --> M[Manual Intervention] subgraph "Phase 1 Details" B1[Job Analysis] B2[Multi-Resume Assessment] B3[Company Research] B4[Strategic Positioning] B --> B1 --> B2 --> B3 --> B4 --> C end subgraph "Phase 2 Details" D1[Portfolio Synthesis] D2[Content Optimization] D3[Word Count Management] D4[Strategic Alignment] D --> D1 --> D2 --> D3 --> D4 --> F end subgraph "Phase 3 Details" G1[Voice Analysis] G2[Content Generation] G3[Authenticity Validation] G4[Company Customization] G --> G1 --> G2 --> G3 --> G4 --> I end ``` ### File Management Architecture ```mermaid graph TB subgraph "Project Root" A[job-application-engine/] end subgraph "User Data" B[user_data/resumes/] C[user_data/cover_letter_references/selected/] D[user_data/cover_letter_references/other/] end subgraph "Applications" E[applications/company_role_date/] F[├── original_job_description.md] G[├── research_report.md] H[├── optimized_resume.md] I[├── cover_letter.md] J[├── user_inputs.json] K[└── application_status.json] end subgraph "Configuration" L[config/] M[├── engine_config.yaml] N[├── claude_api_config.json] O[└── templates/] end A --> B A --> C A --> D A --> E A --> L E --> F E --> G E --> H E --> I E --> J E --> K ``` --- ## 🗂️ Project Structure ### Directory Layout ``` job-application-engine/ ├── app.py # Streamlit main application ├── requirements.txt # Python dependencies ├── config/ │ ├── engine_config.yaml # Engine configuration │ ├── claude_api_config.json # API configuration │ └── templates/ # Document templates │ ├── research_template.md │ ├── resume_template.md │ └── cover_letter_template.md ├── src/ # Source code │ ├── __init__.py │ ├── engine/ # Core engine │ │ ├── __init__.py │ │ ├── application_engine.py # Main engine class │ │ ├── phase_orchestrator.py # Workflow management │ │ └── state_manager.py # State tracking │ ├── agents/ # AI processing agents │ │ ├── __init__.py │ │ ├── research_agent.py # Phase 1: Research │ │ ├── resume_optimizer.py # Phase 2: Resume │ │ ├── cover_letter_generator.py # Phase 3: Cover Letter │ │ └── claude_client.py # Claude API integration │ ├── models/ # Data models │ │ ├── __init__.py │ │ ├── application.py # Application entity │ │ ├── job_data.py # Job information │ │ ├── resume.py # Resume models │ │ └── results.py # Phase results │ ├── storage/ # Storage management │ │ ├── __init__.py │ │ ├── file_manager.py # File operations │ │ ├── application_store.py # Application persistence │ │ └── reference_database.py # Cover letter references │ ├── ui/ # User interface │ │ ├── __init__.py │ │ ├── streamlit_app.py # Streamlit components │ │ ├── workflow_ui.py # Workflow interface │ │ └── file_management_ui.py # File management │ └── utils/ # Utilities │ ├── __init__.py │ ├── validators.py # Input validation │ ├── formatters.py # Output formatting │ └── helpers.py # Helper functions ├── user_data/ # User's career documents │ ├── resumes/ │ │ ├── resume_complete.md │ │ ├── resume_technical.md │ │ └── resume_management.md │ └── cover_letter_references/ │ ├── selected/ # Tagged as references │ │ ├── cover_letter_tech.md │ │ └── cover_letter_consulting.md │ └── other/ # Available references │ └── cover_letter_finance.md ├── applications/ # Generated applications │ ├── dillon_consulting_data_analyst_2025_07_22/ │ └── shopify_senior_developer_2025_07_23/ ├── tests/ # Test suite │ ├── unit/ │ ├── integration/ │ └── fixtures/ ├── docs/ # Documentation │ ├── architecture.md │ ├── user_guide.md │ └── api_reference.md └── scripts/ # Utility scripts ├── setup_project.py └── backup_applications.py ``` ### Module Responsibilities | Module | Purpose | Key Classes | Dependencies | |--------|---------|-------------|--------------| | `engine/` | Core workflow orchestration | `ApplicationEngine`, `PhaseOrchestrator` | `agents/`, `models/` | | `agents/` | AI processing logic | `ResearchAgent`, `ResumeOptimizer`, `CoverLetterGenerator` | `models/`, `utils/` | | `models/` | Data structures and business logic | `Application`, `JobData`, `Resume`, `ResumePortfolio` | `pydantic` | | `storage/` | File system operations | `FileManager`, `ApplicationStore`, `ReferenceDatabase` | `pathlib`, `json` | | `ui/` | User interface components | `StreamlitApp`, `WorkflowUI`, `FileManagementUI` | `streamlit` | | `utils/` | Cross-cutting concerns | `Validators`, `Formatters`, `Helpers` | Various | --- ## 🔌 Extensibility Architecture ### Plugin System Design ```python class EnginePlugin(ABC): """Base plugin interface for extending engine functionality""" def before_phase_execution(self, phase: PhaseOrchestrator.Phases, context: PhaseContext) -> PhaseContext: """Modify context before phase execution""" return context def after_phase_completion(self, phase: PhaseOrchestrator.Phases, result: PhaseResult) -> PhaseResult: """Process result after phase completion""" return result def on_application_created(self, application: Application) -> None: """React to new application creation""" pass class MetricsPlugin(EnginePlugin): """Collect application performance metrics""" def after_phase_completion(self, phase: PhaseOrchestrator.Phases, result: PhaseResult) -> PhaseResult: self.record_phase_metrics(phase, result.execution_time, result.success) return result class BackupPlugin(EnginePlugin): """Automatic backup of application data""" def on_application_created(self, application: Application) -> None: self.backup_application(application) ``` ### Configuration System ```python @dataclass class EngineConfig: # Core settings claude_api_key: str base_output_directory: str = "./applications" max_concurrent_phases: int = 1 # AI processing research_model: str = "claude-sonnet-4-20250514" resume_word_limit: int = 600 cover_letter_word_range: tuple = (350, 450) # File management auto_backup_enabled: bool = True backup_retention_days: int = 30 # UI preferences streamlit_theme: str = "light" show_advanced_options: bool = False # Extensions enabled_plugins: List[str] = field(default_factory=list) @classmethod def from_file(cls, config_path: str) -> 'EngineConfig': """Load configuration from YAML file""" def validate(self) -> List[ValidationError]: """Validate configuration completeness and correctness""" ``` ### Multi-Resume Strategy Pattern ```python class ResumeSelectionStrategy(ABC): """Strategy for selecting optimal resume content for specific jobs""" def select_primary_resume(self, portfolio: ResumePortfolio, job_analysis: JobAnalysis) -> Resume: """Select the most relevant primary resume""" def get_supplementary_content(self, portfolio: ResumePortfolio, primary: Resume) -> List[ResumeSection]: """Extract additional content from other resume versions""" class TechnicalRoleStrategy(ResumeSelectionStrategy): """Optimize resume selection for technical positions""" class ManagementRoleStrategy(ResumeSelectionStrategy): """Optimize resume selection for management positions""" class ConsultingRoleStrategy(ResumeSelectionStrategy): """Optimize resume selection for consulting positions""" ``` --- ## 🚀 Development Phases ### **Phase 1: MVP Foundation (Completed)** - ✅ Streamlit UI with file management - ✅ 3-phase workflow execution - ✅ Claude API integration - ✅ Local file storage system - ✅ Multi-resume processing - ✅ Cover letter reference system - ✅ Application status tracking ### **Phase 2: Enhanced Intelligence (Next)** - 🔄 Advanced company research integration - 🔄 Improved multi-resume synthesis algorithms - 🔄 Voice pattern analysis enhancement - 🔄 Strategic positioning optimization - 🔄 Application performance analytics - 🔄 Export functionality (PDF, Word, etc.) ### **Phase 3: Automation & Scale (Future)** - 📋 Batch application processing - 📋 Template management system - 📋 Application campaign planning - 📋 Success rate tracking and optimization - 📋 Integration with job boards APIs - 📋 Automated application submission ### **Phase 4: Enterprise Features (Future)** - 📋 Multi-user support with role-based access - 📋 Team collaboration features - 📋 Advanced analytics and reporting - 📋 Custom workflow templates - 📋 Integration with HR systems - 📋 White-label deployment options --- ## 🎯 Technical Specifications ### **Technology Stack** | Component | Technology | Version | Rationale | |-----------|------------|---------|-----------| | **UI Framework** | Streamlit | 1.28.1 | Rapid prototyping, built-in components, Python-native | | **HTTP Client** | requests | 2.31.0 | Reliable, well-documented, synchronous operations | | **Data Validation** | Pydantic | 2.0+ | Type safety, automatic validation, great developer experience | | **File Operations** | pathlib | Built-in | Modern, object-oriented path handling | | **Configuration** | PyYAML | 6.0+ | Human-readable configuration files | | **CLI Future** | Click + Rich | Latest | User-friendly CLI with beautiful output | | **Testing** | pytest | 7.0+ | Comprehensive testing framework | | **Documentation** | MkDocs | 1.5+ | Beautiful, searchable documentation | ### **Performance Requirements** | Metric | Target | Measurement Method | |--------|--------|-------------------| | **Application Creation** | <2 seconds | Time from form submission to folder creation | | **Phase 1 Research** | <30 seconds | Claude API response + processing time | | **Phase 2 Resume** | <20 seconds | Multi-resume synthesis + optimization | | **Phase 3 Cover Letter** | <15 seconds | Voice analysis + content generation | | **File Operations** | <1 second | Local file read/write operations | | **UI Responsiveness** | <500ms | Streamlit component render time | ### **Quality Standards** #### **Code Quality Metrics** - **Type Coverage**: 90%+ type hints on all public APIs - **Test Coverage**: 85%+ line coverage maintained - **Documentation**: All public methods and classes documented - **Code Style**: Black formatter + isort + flake8 compliance - **Complexity**: Max cyclomatic complexity of 10 per function #### **Security Requirements** - No API keys hardcoded in source code - Environment variable management for secrets - Input sanitization for all user data - Safe file path handling to prevent directory traversal - Regular dependency vulnerability scanning #### **Reliability Standards** - Graceful handling of API failures with user-friendly messages - Automatic retry logic for transient failures - Data integrity validation after file operations - Rollback capability for failed workflow phases - Comprehensive error logging with context --- ## 📈 Monitoring & Analytics ### **Application Metrics** ```python class ApplicationMetrics: """Track application performance and success rates""" def record_application_created(self, app: Application) -> None def record_phase_completion(self, app_id: str, phase: PhaseOrchestrator.Phases, duration: float) -> None def record_application_submitted(self, app_id: str) -> None def record_application_response(self, app_id: str, response_type: ResponseType) -> None # Analytics queries def get_success_rate(self, date_range: DateRange) -> float def get_average_completion_time(self, phase: PhaseOrchestrator.Phases) -> float def get_most_effective_strategies(self) -> List[StrategyMetric] ``` ### **Performance Monitoring** ```python class PerformanceMonitor: """Monitor system performance and resource usage""" def track_api_response_times(self) -> Dict[str, float] def monitor_file_system_usage(self) -> StorageMetrics def track_memory_usage(self) -> MemoryMetrics def generate_performance_report(self) -> PerformanceReport ``` ### **User Experience Analytics** - Workflow completion rates by phase - Most common user pain points - Feature usage statistics - Error frequency and resolution rates - Time-to-value metrics --- ## 🔒 Security Architecture ### **Data Protection Strategy** - **Local-First**: All sensitive career data stored locally - **API Key Management**: Secure environment variable handling - **Input Validation**: Comprehensive sanitization of all user inputs - **File System Security**: Restricted file access patterns - **Audit Trail**: Complete logging of all file operations ### **Privacy Considerations** - No personal data transmitted to third parties (except Claude API for processing) - User control over all data retention and deletion - Transparent data usage policies - Optional anonymization for analytics --- ## 🎨 User Experience Design ### **Design Principles** 1. **Simplicity First**: Complex AI power hidden behind simple interfaces 2. **Progress Transparency**: Clear feedback on all processing steps 3. **Error Recovery**: Graceful handling with actionable next steps 4. **Customization**: Flexible configuration without overwhelming options 5. **Mobile Friendly**: Responsive design for various screen sizes ### **User Journey Optimization** ```mermaid journey title Job Application Creation Journey section Setup Configure folders: 5: User Upload resumes: 4: User Tag references: 3: User section Application Paste job description: 5: User Review auto-generated name: 4: User Start research phase: 5: User section AI Processing Wait for research: 3: User, AI Review research results: 4: User Approve resume optimization: 5: User, AI Review cover letter: 5: User, AI section Completion Make final edits: 4: User Export documents: 5: User Mark as applied: 5: User ``` --- ## 📚 Documentation Strategy ### **Documentation Hierarchy** 1. **Architecture Guide** (This Document) - Technical architecture and design decisions 2. **User Guide** - Step-by-step usage instructions with screenshots 3. **API Reference** - Detailed API documentation for extensions 4. **Developer Guide** - Setup, contribution guidelines, and development practices 5. **Troubleshooting Guide** - Common issues and solutions ### **Documentation Standards** - All public APIs documented with docstrings - Code examples for all major features - Screenshots for UI components - Video tutorials for complex workflows - Regular documentation updates with each release --- ## 🚀 Deployment & Distribution ### **Distribution Strategy** - **GitHub Repository**: Open source with comprehensive documentation - **PyPI Package**: Easy installation via pip - **Docker Container**: Containerized deployment option - **Executable Bundle**: Standalone executable for non-technical users ### **Deployment Options** ```python # Option 1: Direct Python execution python -m streamlit run app.py # Option 2: Docker deployment docker run -p 8501:8501 job-application-engine # Option 3: Heroku deployment git push heroku main # Option 4: Local installation pip install job-application-engine job-app-engine --config myconfig.yaml ``` --- ## 🔮 Future Enhancements ### **Advanced AI Features** - **Multi-Model Support**: Integration with multiple AI providers - **Specialized Models**: Domain-specific fine-tuned models - **Continuous Learning**: System learns from successful applications - **Predictive Analytics**: Success probability estimation ### **Integration Ecosystem** - **LinkedIn Integration**: Auto-import job postings and company data - **ATS Integration**: Direct submission to Applicant Tracking Systems - **CRM Integration**: Track application pipeline in existing CRM - **Calendar Integration**: Application deadline management ### **Enterprise Features** - **Multi-Tenant Architecture**: Support multiple users/organizations - **Role-Based Access Control**: Team collaboration with permission levels - **Workflow Customization**: Industry-specific workflow templates - **Advanced Analytics**: Success attribution and optimization recommendations --- *This architecture guide serves as the authoritative reference for the Job Application Engine system design and implementation. For implementation details, see the source code and technical documentation.* *For questions or contributions, please refer to the project repository and contribution guidelines.*