This comprehensive update transforms Job Forge from a generic MVP concept to a production-ready Python/FastAPI web application prototype with complete documentation, testing infrastructure, and deployment procedures. ## 🏗️ Architecture Changes - Updated all documentation to reflect Python/FastAPI + Dash + PostgreSQL stack - Transformed from MVP concept to deployable web application prototype - Added comprehensive multi-tenant architecture with Row Level Security (RLS) - Integrated Claude API and OpenAI API for AI-powered document generation ## 📚 Documentation Overhaul - **CLAUDE.md**: Complete rewrite as project orchestrator for 4 specialized agents - **README.md**: New centralized documentation hub with organized navigation - **API Specification**: Updated with comprehensive FastAPI endpoint documentation - **Database Design**: Enhanced schema with RLS policies and performance optimization - **Architecture Guide**: Transformed to web application focus with deployment strategy ## 🏗️ New Documentation Structure - **docs/development/**: Python/FastAPI coding standards and development guidelines - **docs/infrastructure/**: Docker setup and server deployment procedures - **docs/testing/**: Comprehensive QA procedures with pytest integration - **docs/ai/**: AI prompt templates and examples (preserved from original) ## 🎯 Team Structure Updates - **.claude/agents/**: 4 new Python/FastAPI specialized agents - simplified_technical_lead.md: Architecture and technical guidance - fullstack_developer.md: FastAPI backend + Dash frontend implementation - simplified_qa.md: pytest testing and quality assurance - simplified_devops.md: Docker deployment and server infrastructure ## 🧪 Testing Infrastructure - **pytest.ini**: Complete pytest configuration with coverage requirements - **tests/conftest.py**: Comprehensive test fixtures and database setup - **tests/unit/**: Example unit tests for auth and application services - **tests/integration/**: API integration test examples - Support for async testing, AI service mocking, and database testing ## 🧹 Cleanup - Removed 9 duplicate/outdated documentation files - Eliminated conflicting technology references (Node.js/TypeScript) - Consolidated overlapping content into comprehensive guides - Cleaned up project structure for professional development workflow ## 🚀 Production Ready Features - Docker containerization for development and production - Server deployment procedures for prototype hosting - Security best practices with JWT authentication and RLS - Performance optimization with database indexing and caching - Comprehensive testing strategy with quality gates This update establishes Job Forge as a professional Python/FastAPI web application prototype ready for development and deployment. 🤖 Generated with Claude Code (https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
17 KiB
17 KiB
Job Forge - Database Design & Schema
Version: 1.0.0 Prototype
Database: PostgreSQL 16 with pgvector
Target Audience: Full-Stack Developers
Last Updated: August 2025
🎯 Database Overview
Technology Stack
- Database: PostgreSQL 16
- Extensions: pgvector (for AI embeddings), uuid-ossp (for UUID generation)
- Security: Row Level Security (RLS) for multi-tenant architecture
- Connection: AsyncPG with SQLAlchemy 2.0 async ORM
- Migrations: Alembic for database schema versioning
Design Principles
- Multi-Tenancy: Complete data isolation between users via RLS
- Data Integrity: Foreign key constraints and comprehensive validation
- Performance: Strategic indexes for query optimization
- Security: Defense-in-depth with RLS policies and input validation
- Scalability: Schema designed for horizontal scaling and future features
- Maintainability: Clear naming conventions and well-documented structure
📊 Entity Relationship Diagram
erDiagram
USERS ||--o{ APPLICATIONS : creates
USERS ||--o{ USER_RESUMES : owns
APPLICATIONS ||--o{ DOCUMENTS : contains
DOCUMENTS ||--o| DOCUMENT_EMBEDDINGS : has_embedding
USERS {
uuid id PK
varchar email UK
varchar password_hash
varchar first_name
varchar last_name
boolean is_active
timestamptz created_at
timestamptz updated_at
}
APPLICATIONS {
uuid id PK
uuid user_id FK
varchar company_name
varchar role_title
text job_description
text job_url
varchar location
varchar status
timestamptz created_at
timestamptz updated_at
}
DOCUMENTS {
uuid id PK
uuid application_id FK
varchar document_type
text content
timestamptz created_at
timestamptz updated_at
}
USER_RESUMES {
uuid id PK
uuid user_id FK
varchar name
text content
varchar focus_area
boolean is_primary
timestamptz created_at
timestamptz updated_at
}
DOCUMENT_EMBEDDINGS {
uuid id PK
uuid document_id FK
vector_1536 embedding
timestamptz created_at
}
🗄️ Complete Database Schema
Database Initialization
-- Enable required extensions
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS vector;
-- Create custom types
CREATE TYPE application_status_type AS ENUM (
'draft',
'applied',
'interview',
'rejected',
'offer'
);
CREATE TYPE document_type_enum AS ENUM (
'research_report',
'optimized_resume',
'cover_letter'
);
CREATE TYPE focus_area_type AS ENUM (
'software_development',
'data_science',
'management',
'consulting',
'other'
);
Core Tables
Users Table
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL,
is_active BOOLEAN DEFAULT TRUE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
-- Constraints
CONSTRAINT email_format CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'),
CONSTRAINT first_name_not_empty CHECK (LENGTH(TRIM(first_name)) > 0),
CONSTRAINT last_name_not_empty CHECK (LENGTH(TRIM(last_name)) > 0)
);
-- Indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_active ON users(is_active);
CREATE INDEX idx_users_created_at ON users(created_at);
-- Row Level Security
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
-- Users can only see their own record
CREATE POLICY users_own_data ON users
FOR ALL
USING (id = current_setting('app.current_user_id')::UUID);
Applications Table
CREATE TABLE applications (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
company_name VARCHAR(255) NOT NULL,
role_title VARCHAR(255) NOT NULL,
job_description TEXT NOT NULL,
job_url TEXT,
location VARCHAR(255),
status application_status_type DEFAULT 'draft',
-- Timestamps
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
-- Constraints
CONSTRAINT job_description_min_length CHECK (LENGTH(job_description) >= 50),
CONSTRAINT company_name_not_empty CHECK (LENGTH(TRIM(company_name)) > 0),
CONSTRAINT role_title_not_empty CHECK (LENGTH(TRIM(role_title)) > 0),
CONSTRAINT valid_job_url CHECK (
job_url IS NULL OR
job_url ~* '^https?://[^\s/$.?#].[^\s]*$'
)
);
-- Indexes
CREATE INDEX idx_applications_user_id ON applications(user_id);
CREATE INDEX idx_applications_status ON applications(status);
CREATE INDEX idx_applications_created_at ON applications(created_at);
CREATE INDEX idx_applications_company_name ON applications(company_name);
-- Full text search index for job descriptions
CREATE INDEX idx_applications_job_description_fts
ON applications USING gin(to_tsvector('english', job_description));
-- Row Level Security
ALTER TABLE applications ENABLE ROW LEVEL SECURITY;
CREATE POLICY applications_user_access ON applications
FOR ALL
USING (user_id = current_setting('app.current_user_id')::UUID);
Documents Table
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
application_id UUID NOT NULL REFERENCES applications(id) ON DELETE CASCADE,
document_type document_type_enum NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
-- Constraints
CONSTRAINT content_min_length CHECK (LENGTH(content) >= 10),
CONSTRAINT unique_document_per_application UNIQUE (application_id, document_type)
);
-- Indexes
CREATE INDEX idx_documents_application_id ON documents(application_id);
CREATE INDEX idx_documents_type ON documents(document_type);
CREATE INDEX idx_documents_updated_at ON documents(updated_at);
-- Full text search index for document content
CREATE INDEX idx_documents_content_fts
ON documents USING gin(to_tsvector('english', content));
-- Row Level Security
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
CREATE POLICY documents_user_access ON documents
FOR ALL
USING (
application_id IN (
SELECT id FROM applications
WHERE user_id = current_setting('app.current_user_id')::UUID
)
);
User Resumes Table
CREATE TABLE user_resumes (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
content TEXT NOT NULL,
focus_area focus_area_type DEFAULT 'other',
is_primary BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
-- Constraints
CONSTRAINT resume_name_not_empty CHECK (LENGTH(TRIM(name)) > 0),
CONSTRAINT resume_content_min_length CHECK (LENGTH(content) >= 100),
-- Only one primary resume per user
CONSTRAINT unique_primary_resume UNIQUE (user_id, is_primary)
DEFERRABLE INITIALLY DEFERRED
);
-- Indexes
CREATE INDEX idx_user_resumes_user_id ON user_resumes(user_id);
CREATE INDEX idx_user_resumes_focus_area ON user_resumes(focus_area);
CREATE INDEX idx_user_resumes_is_primary ON user_resumes(is_primary);
-- Full text search index for resume content
CREATE INDEX idx_user_resumes_content_fts
ON user_resumes USING gin(to_tsvector('english', content));
-- Row Level Security
ALTER TABLE user_resumes ENABLE ROW LEVEL SECURITY;
CREATE POLICY user_resumes_access ON user_resumes
FOR ALL
USING (user_id = current_setting('app.current_user_id')::UUID);
Document Embeddings Table (AI Features)
CREATE TABLE document_embeddings (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
embedding vector(1536), -- OpenAI text-embedding-3-large dimension
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
-- Constraints
CONSTRAINT unique_embedding_per_document UNIQUE (document_id)
);
-- Vector similarity index
CREATE INDEX idx_document_embeddings_vector
ON document_embeddings USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Regular indexes
CREATE INDEX idx_document_embeddings_document_id ON document_embeddings(document_id);
-- Row Level Security
ALTER TABLE document_embeddings ENABLE ROW LEVEL SECURITY;
CREATE POLICY document_embeddings_access ON document_embeddings
FOR ALL
USING (
document_id IN (
SELECT d.id FROM documents d
JOIN applications a ON d.application_id = a.id
WHERE a.user_id = current_setting('app.current_user_id')::UUID
)
);
🔒 Security Policies
Row Level Security Overview
All tables with user data have RLS enabled to ensure complete data isolation:
-- Function to get current user ID from session
CREATE OR REPLACE FUNCTION get_current_user_id()
RETURNS UUID AS $$
BEGIN
RETURN current_setting('app.current_user_id')::UUID;
EXCEPTION
WHEN others THEN
RETURN NULL;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
-- Helper function to check if user owns application
CREATE OR REPLACE FUNCTION user_owns_application(app_id UUID)
RETURNS BOOLEAN AS $$
BEGIN
RETURN EXISTS (
SELECT 1 FROM applications
WHERE id = app_id
AND user_id = get_current_user_id()
);
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
Setting User Context
Backend must set user context for each request:
# In FastAPI dependency
async def set_user_context(user: User = Depends(get_current_user)):
async with get_db_connection() as conn:
await conn.execute(
"SET LOCAL app.current_user_id = %s",
str(user.id)
)
return user
🚀 Database Functions
Trigger Functions
-- Update timestamp trigger function
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Apply to all tables with updated_at
CREATE TRIGGER update_users_updated_at
BEFORE UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();
CREATE TRIGGER update_applications_updated_at
BEFORE UPDATE ON applications
FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();
CREATE TRIGGER update_documents_updated_at
BEFORE UPDATE ON documents
FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();
CREATE TRIGGER update_user_resumes_updated_at
BEFORE UPDATE ON user_resumes
FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();
Business Logic Functions
-- Generate application name
CREATE OR REPLACE FUNCTION generate_application_name(
p_company_name VARCHAR,
p_role_title VARCHAR
) RETURNS VARCHAR AS $$
DECLARE
clean_company VARCHAR;
clean_role VARCHAR;
date_suffix VARCHAR;
BEGIN
-- Clean and normalize names
clean_company := LOWER(REGEXP_REPLACE(p_company_name, '[^a-zA-Z0-9]', '_', 'g'));
clean_role := LOWER(REGEXP_REPLACE(p_role_title, '[^a-zA-Z0-9]', '_', 'g'));
date_suffix := TO_CHAR(NOW(), 'YYYY_MM_DD');
RETURN clean_company || '_' || clean_role || '_' || date_suffix;
END;
$$ LANGUAGE plpgsql;
-- Application status validation function
CREATE OR REPLACE FUNCTION validate_application_status()
RETURNS TRIGGER AS $$
BEGIN
-- Ensure status transitions are logical
IF NEW.status = OLD.status THEN
RETURN NEW;
END IF;
-- Log status changes for audit purposes
RAISE NOTICE 'Application % status changed from % to %',
NEW.id, OLD.status, NEW.status;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER validate_application_status_trigger
BEFORE UPDATE ON applications
FOR EACH ROW EXECUTE FUNCTION validate_application_status();
📈 Performance Optimization
Query Optimization
-- Most common query patterns with optimized indexes
-- 1. Get user applications (paginated)
-- Index: idx_applications_user_id, idx_applications_created_at
SELECT * FROM applications
WHERE user_id = $1
ORDER BY created_at DESC
LIMIT $2 OFFSET $3;
-- 2. Get application with documents
-- Index: idx_documents_application_id
SELECT a.*, d.document_type, d.content
FROM applications a
LEFT JOIN documents d ON a.id = d.application_id
WHERE a.id = $1 AND a.user_id = $2;
-- 3. Search applications by company/role
-- Index: idx_applications_company_name, full-text search
SELECT * FROM applications
WHERE user_id = $1
AND (
company_name ILIKE $2
OR role_title ILIKE $3
OR to_tsvector('english', job_description) @@ plainto_tsquery('english', $4)
)
ORDER BY created_at DESC;
Connection Pooling
# SQLAlchemy async engine configuration
engine = create_async_engine(
DATABASE_URL,
pool_size=20, # Connection pool size
max_overflow=30, # Additional connections beyond pool_size
pool_pre_ping=True, # Validate connections before use
pool_recycle=3600, # Recycle connections every hour
echo=False # Disable SQL logging in production
)
🧪 Test Data Setup
Development Seed Data
-- Insert test user (password: "testpass123")
INSERT INTO users (id, email, password_hash, first_name, last_name, is_active) VALUES (
'123e4567-e89b-12d3-a456-426614174000',
'test@example.com',
'$2b$12$LQv3c1yqBWVHxkd0LHAkCOYz6TtxMQJqhN8/LewgdyN8yF5V4M2kq',
'Test',
'User',
true
);
-- Insert test resume
INSERT INTO user_resumes (user_id, name, content, focus_area, is_primary) VALUES (
'123e4567-e89b-12d3-a456-426614174000',
'Software Developer Resume',
'# Test User\n\n## Experience\n\nSoftware Developer at Tech Corp...',
'software_development',
true
);
-- Insert test application
INSERT INTO applications (
user_id, company_name, role_title,
job_description, job_url, status
) VALUES (
'123e4567-e89b-12d3-a456-426614174000',
'Google',
'Senior Developer',
'We are seeking an experienced software developer to join our team building cutting-edge applications. You will work with Python, FastAPI, and modern web technologies.',
'https://careers.google.com/jobs/results/123456789/',
'draft'
);
🔄 Database Migrations (Future)
Migration Strategy for Phase 2
When adding Alembic migrations:
# alembic/env.py configuration for RLS
from sqlalchemy import text
def run_migrations_online():
# Set up RLS context for migrations
with engine.connect() as connection:
connection.execute(text("SET row_security = off"))
context.configure(
connection=connection,
target_metadata=target_metadata,
compare_type=True,
compare_server_default=True
)
with context.begin_transaction():
context.run_migrations()
Planned Schema Changes
- Usage tracking tables for SaaS billing
- Subscription management tables
- Audit log tables for compliance
- Performance metrics tables
- Additional indexes based on production usage
🛠️ Database Maintenance
Regular Maintenance Tasks
-- Vacuum and analyze (run weekly)
VACUUM ANALYZE;
-- Update table statistics
ANALYZE applications;
ANALYZE documents;
ANALYZE user_resumes;
-- Check index usage
SELECT schemaname, tablename, indexname, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_tup_read DESC;
-- Monitor vector index performance
SELECT * FROM pg_stat_user_indexes
WHERE indexname LIKE '%vector%';
Backup Strategy
# Daily backup script
pg_dump -h localhost -U jobforge_user -d jobforge_mvp \
--clean --if-exists --verbose \
> backup_$(date +%Y%m%d).sql
# Restore from backup
psql -h localhost -U jobforge_user -d jobforge_mvp < backup_20250701.sql
📊 Monitoring Queries
Performance Monitoring
-- Slow queries
SELECT query, mean_time, calls, total_time
FROM pg_stat_statements
WHERE mean_time > 100 -- queries slower than 100ms
ORDER BY mean_time DESC;
-- Table sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size,
pg_total_relation_size(schemaname||'.'||tablename) as size_bytes
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY size_bytes DESC;
-- Connection counts
SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state;
This database design provides a robust foundation for the Job Forge web application with strong security, performance optimization, and scalability. The RLS policies ensure complete multi-tenant data isolation, while the schema supports efficient AI-powered document generation workflows.