Large Files Refactoring Plan
Date: 2026-03-31 Author: LingFlow Development Team Version: 1.0
Executive Summary
This document outlines the refactoring strategy for three large files in the LingFlow project that exceed the recommended 500-line limit. The plan focuses on modularization, separation of concerns, and maintaining backward compatibility while improving code maintainability.
Files Analyzed
| File | Lines | Status | Priority |
|---|---|---|---|
skills/deployment-automation/implementation.py |
1,264 | Critical | P0 |
lingflow_v4_example.py |
1,041 | High | P1 |
skills/api-doc-generator/implementation.py |
969 | High | P1 |
1. deployment-automation/implementation.py (1,264 lines)
Current Structure Analysis
The file contains multiple responsibilities:
- Template Definitions (Lines 31-727): ~700 lines
- Dockerfile templates (5 variants)
- Kubernetes manifests (7 templates)
- Blue-green deployment templates
-
CI/CD pipeline templates (GitLab, GitHub)
-
Core Functions (Lines 730-1265): ~535 lines
execute_skill()- Main entry point_generate_dockerfile()- Dockerfile generation_generate_k8s_configs()- Kubernetes config generation_generate_blue_green()- Blue-green deployment_generate_rollback_script()- Rollback scripts_generate_ci_cd()- CI/CD configurationdetect_project_type()- Project type detection
Issues Identified
- Single Responsibility Violation: Templates, generation logic, and orchestration mixed
- Hardcoded Templates: Large template strings embedded in code
- Difficult Testing: Cannot test templates independently
- Maintenance Burden: Adding new template types requires modifying main file
- Code Duplication: Similar formatting logic across different generators
Refactoring Strategy
Phase 1: Extract Templates (Low Risk)
Action: Move all templates to separate files in a templates/ subdirectory
New Structure:
skills/deployment-automation/
├── implementation.py (~200 lines - orchestration only)
├── templates/
│ ├── __init__.py
│ ├── dockerfiles/
│ │ ├── __init__.py
│ │ ├── python.py
│ │ ├── nodejs.py
│ │ ├── go.py
│ │ ├── java.py
│ │ └── static.py
│ ├── kubernetes/
│ │ ├── __init__.py
│ │ ├── deployment.py
│ │ ├── service.py
│ │ ├── ingress.py
│ │ ├── hpa.py
│ │ ├── configmap.py
│ │ └── secret.py
│ ├── strategies/
│ │ ├── __init__.py
│ │ ├── blue_green.py
│ │ └── rolling.py
│ └── cicd/
│ ├── __init__.py
│ ├── gitlab.py
│ └── github.py
├── generators/
│ ├── __init__.py
│ ├── dockerfile.py
│ ├── kubernetes.py
│ ├── strategies.py
│ └── cicd.py
└── utils/
├── __init__.py
└── detector.py
Benefits: - Reduces main file by ~700 lines - Templates can be edited independently - Easier to add new templates - Templates can be validated/tested separately
Phase 2: Extract Generators (Medium Risk)
Action: Create generator classes for each output type
New File Structure:
generators/dockerfile.py (~150 lines)
class DockerfileGenerator:
def __init__(self, project_type: str, config: Dict):
self.project_type = project_type
self.config = config
self.template = self._load_template()
def generate(self) -> str:
# Render template with config
pass
def generate_dockerignore(self) -> str:
# Generate .dockerignore
pass
generators/kubernetes.py (~200 lines)
class KubernetesGenerator:
def __init__(self, config: Dict):
self.config = config
def generate_deployment(self) -> str:
pass
def generate_service(self) -> str:
pass
def generate_ingress(self) -> str:
pass
def generate_all(self) -> Dict[str, str]:
pass
generators/strategies.py (~150 lines)
class DeploymentStrategyGenerator:
def generate_blue_green(self) -> Dict[str, str]:
pass
def generate_rollback(self) -> Dict[str, str]:
pass
generators/cicd.py (~150 lines)
class CICDGenerator:
def generate_gitlab_ci(self) -> str:
pass
def generate_github_actions(self) -> str:
pass
Phase 3: Simplify Main Implementation (Low Risk)
New implementation.py (~200 lines):
"""Deployment automation skill - refactored version"""
from .generators.dockerfile import DockerfileGenerator
from .generators.kubernetes import KubernetesGenerator
from .generators.strategies import DeploymentStrategyGenerator
from .generators.cicd import CICDGenerator
from .utils.detector import ProjectTypeDetector
def execute_skill(params: Dict) -> Dict:
"""Orchestrate deployment file generation"""
# Validate input (already has Pydantic)
# Detect project type
# Call appropriate generators
# Return results
Implementation Steps
- Step 1 (1 day): Create directory structure and move templates
- Step 2 (1 day): Create template loader utility
- Step 3 (2 days): Extract Dockerfile generator
- Step 4 (2 days): Extract Kubernetes generators
- Step 5 (1 day): Extract CI/CD generators
- Step 6 (1 day): Extract strategy generators
- Step 7 (1 day): Update main implementation
- Step 8 (1 day): Update tests and verify
Total Estimated Time: 10 days
Testing Strategy
- Existing tests in
tests/deployment-automation/must continue passing - Add new tests for each generator class
- Test template loading independently
- Integration test for full workflow
Rollback Plan
- Keep old implementation as
implementation.py.bak - Feature flag to switch between old/new implementation
- Git branch for easy revert
2. lingflow_v4_example.py (1,041 lines)
Current Structure Analysis
This is a demo/example file, not production code. It contains:
- Result Type (Lines 33-184): ~150 lines
- Exception Hierarchy (Lines 187-231): ~45 lines
- Configuration System (Lines 234-367): ~135 lines
- Skill Base Classes (Lines 370-460): ~90 lines
- Cache Manager (Lines 463-575): ~115 lines
- Monitor System (Lines 578-638): ~60 lines
- Skill Service (Lines 641-736): ~95 lines
- Example Skills (Lines 739-810): ~70 lines
- Test Functions (Lines 813-1006): ~195 lines
- Main Entry (Lines 1008-1041): ~35 lines
Issues Identified
- Mixed Purpose: Contains library code AND examples/tests
- Reusable Components: Result type, config, cache, monitor are useful
- Not in Proper Location: Should be in
lingflow/package, not root - No Module Separation: All components in one file
Refactoring Strategy
Option A: Move to lingflow-core Package (Recommended)
Since this contains useful core components, extract them to the main package:
New Structure:
lingflow/
├── core/
│ ├── __init__.py
│ ├── result.py # Result type + factories
│ ├── errors.py # Exception hierarchy
│ ├── config.py # Config + builder
│ ├── cache.py # CacheManager
│ ├── monitoring.py # Monitor
│ └── skill_base.py # BaseSkill, SkillContext, SkillResult
├── services/
│ ├── __init__.py
│ └── skill_service.py # SimpleSkillService
└── examples/
└── v4_demo.py # Demo/test code only
File Size Distribution:
- result.py: ~150 lines
- errors.py: ~45 lines
- config.py: ~135 lines
- cache.py: ~115 lines
- monitoring.py: ~60 lines
- skill_base.py: ~90 lines
- skill_service.py: ~95 lines
- v4_demo.py: ~250 lines (tests + examples)
Option B: Delete and Document (Alternative)
If these components are already implemented elsewhere in the codebase:
- Delete the file
- Add documentation pointing to actual implementations
- Keep as reference in
docs/examples/
Implementation Steps (Option A)
- Step 1 (0.5 days): Create
lingflow/core/directory - Step 2 (0.5 days): Extract Result type
- Step 3 (0.5 days): Extract exception hierarchy
- Step 4 (0.5 days): Extract config system
- Step 5 (0.5 days): Extract cache manager
- Step 6 (0.5 days): Extract monitoring
- Step 7 (0.5 days): Extract skill base classes
- Step 8 (0.5 days): Extract skill service
- Step 9 (1 day): Update all imports across project
- Step 10 (0.5 days): Create demo file
- Step 11 (1 day): Update tests
Total Estimated Time: 7 days
Dependencies to Check
# Check if anything imports from v4_example
grep -r "lingflow_v4_example" /home/ai/LingFlow --include="*.py"
3. api-doc-generator/implementation.py (969 lines)
Current Structure Analysis
The file contains multiple concerns:
- Data Structures (Lines 38-65): ~30 lines
-
RouteInfo, SchemaInfo dataclasses
-
Type Mapping (Lines 67-85): ~20 lines
-
PYTHON_TYPE_TO_JSON mapping
-
Main Function (Lines 88-208): ~120 lines
-
execute_skill() - entry point
-
Code Scanning (Lines 211-303): ~95 lines
- scan_code() - main scanner
- extract_route_prefixes() - prefix extraction
-
detect_framework() - framework detection
-
Route Extraction (Lines 305-460): ~155 lines
- extract_routes() - route extraction
- parse_route_decorator() - decorator parsing
-
_extract_decorator_tags() - tag extraction
-
Parameter/Body Extraction (Lines 462-632): ~170 lines
- extract_parameters()
- extract_request_body()
-
extract_responses()
-
Schema Extraction (Lines 635-716): ~80 lines
-
extract_schemas() - Pydantic/dataclass extraction
-
OpenAPI Generation (Lines 719-778): ~60 lines
-
generate_openapi_spec()
-
Output/Utility (Lines 781-969): ~190 lines
- save_document()
- Helper functions (_unparse, infer_type, etc.)
- YAML conversion
Issues Identified
- Complex AST Parsing: Multiple traversal functions
- Framework-Specific Logic: FastAPI vs Flask mixed
- YAML Implementation: Custom YAML converter (reinventing wheel)
- Large Functions: Some functions >50 lines
- Deep Nesting: Multiple nested loops in extractors
Refactoring Strategy
Phase 1: Extract Parsers (Low Risk)
New Structure:
skills/api-doc-generator/
├── implementation.py (~150 lines - orchestration)
├── parsers/
│ ├── __init__.py
│ ├── base.py # Base parser interface
│ ├── fastapi.py # FastAPI specific parser (~200 lines)
│ ├── flask.py # Flask specific parser (~180 lines)
│ └── ast_utils.py # AST utility functions (~100 lines)
├── extractors/
│ ├── __init__.py
│ ├── parameters.py # Parameter extraction (~120 lines)
│ ├── schemas.py # Schema extraction (~150 lines)
│ └── responses.py # Response extraction (~80 lines)
├── generators/
│ ├── __init__.py
│ ├── openapi.py # OpenAPI spec generation (~100 lines)
│ └── output.py # File output handling (~80 lines)
└── utils/
├── __init__.py
├── type_mapping.py # Type conversion utilities (~60 lines)
└── yaml_helper.py # YAML/JSON output (~50 lines)
Phase 2: Create Parser Classes
parsers/fastapi.py (~200 lines):
class FastAPIParser:
def __init__(self):
self.framework = 'fastapi'
def parse_routes(self, tree, content, file_path) -> List[RouteInfo]:
pass
def parse_parameters(self, node, path) -> List[Dict]:
pass
def parse_request_body(self, node) -> Optional[Dict]:
pass
parsers/flask.py (~180 lines):
Phase 3: Extract Schema Handling
extractors/schemas.py (~150 lines):
class SchemaExtractor:
def extract_from_pydantic(self, node) -> SchemaInfo:
pass
def extract_from_dataclass(self, node) -> SchemaInfo:
pass
def handle_inheritance(self, schemas) -> Dict:
pass
Phase 4: Simplify YAML Output
Replace custom YAML implementation with proper library:
Before: Custom to_simple_yaml() function (~150 lines)
After:
try:
import yaml
YAML_AVAILABLE = True
except ImportError:
YAML_AVAILABLE = False
def save_document(doc, output_path, format):
if format == 'yaml':
if YAML_AVAILABLE:
import yaml
with open(output_path, 'w') as f:
yaml.dump(doc, f)
else:
# Fallback to custom implementation
...
Implementation Steps
- Step 1 (1 day): Create new directory structure
- Step 2 (1 day): Extract AST utilities
- Step 3 (2 days): Create FastAPI parser
- Step 4 (2 days): Create Flask parser
- Step 5 (2 days): Extract parameter/schema extractors
- Step 6 (1 day): Create OpenAPI generator
- Step 7 (1 day): Simplify YAML handling
- Step 8 (1 day): Update main implementation
- Step 9 (2 days): Update tests
Total Estimated Time: 13 days
Risk Assessment Matrix
| File | Risk Level | Complexity | Breaking Changes | Rollback Difficulty |
|---|---|---|---|---|
| deployment-automation | Medium | High | Low (internal only) | Easy |
| lingflow_v4_example | Low | Medium | Medium (if imported) | Easy |
| api-doc-generator | Medium | High | Low (internal only) | Medium |
Mitigation Strategies
- Incremental Changes: One module at a time
- Backward Compatibility: Keep old interfaces working
- Comprehensive Testing: Test after each extraction
- Code Review: Review each extraction before merging
- Feature Flags: Allow switching between old/new
Over-Development Metrics
Current State vs Guidelines
| Metric | Guideline | deployment-automation | api-doc-generator | lingflow_v4_example |
|---|---|---|---|---|
| File Lines | <500 | 1,264 ❌ | 969 ❌ | 1,041 ❌ |
| Function Complexity | <15 | Mixed ⚠️ | High ❌ | Mixed ⚠️ |
| Responsibilities | Single | Multiple ❌ | Multiple ❌ | Multiple ❌ |
Function Complexity Analysis
deployment-automation:
- _generate_k8s_configs(): ~130 lines
- _generate_blue_green(): ~100 lines
- _generate_ci_cd(): ~30 lines (acceptable)
api-doc-generator:
- extract_schemas(): ~80 lines
- extract_routes(): ~40 lines
- scan_code(): ~55 lines
Testing Requirements
Existing Test Coverage
tests/deployment-automation/ (✓ exists)
tests/api-doc-generator/ (needs verification)
tests/lingflow_v4_example/ (likely none)
New Test Requirements
- Unit Tests: Each new module
- Integration Tests: Full workflow
- Regression Tests: Ensure no breaking changes
- Performance Tests: Verify no degradation
Implementation Priority
Sprint 1 (Week 1-2): P0 - deployment-automation
- Extract templates to files
- Create generator classes
- Update main implementation
Sprint 2 (Week 3-4): P1 - api-doc-generator
- Extract parsers
- Create extractor classes
- Simplify YAML handling
Sprint 3 (Week 5): P2 - lingflow_v4_example
- Move to lingflow/core or delete
- Update imports
- Create proper examples
Success Criteria
- All files <500 lines ✅
- All functions <15 cyclomatic complexity ✅
- Single responsibility per module ✅
- Test coverage maintained ✅
- No breaking changes to public API ✅
- Documentation updated ✅
Post-Refactoring Maintenance
Code Review Guidelines
- No file exceeds 500 lines
- No function exceeds 15 complexity
- Each module has single responsibility
- Templates in separate files
- Comprehensive test coverage
CI/CD Integration
Add pre-commit hooks:
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: check-file-length
name: Check file length
entry: scripts/check_file_length.sh
language: script
Appendix
A. File Size Monitoring Script
#!/bin/bash
# scripts/check_file_length.sh
find /home/ai/LingFlow -name "*.py" -not -path "*/venv/*" | while read file; do
lines=$(wc -l < "$file")
if [ $lines -gt 500 ]; then
echo "WARNING: $file is $lines lines"
fi
done
B. Complexity Analysis Tool
# Install radon for complexity analysis
pip install radon
# Analyze complexity
radon cc skills/deployment-automation/implementation.py -a
C. Dependency Graph
deployment-automation
├── templates (new)
├── generators (new)
└── utils (new)
api-doc-generator
├── parsers (new)
├── extractors (new)
├── generators (new)
└── utils (new)
lingflow_v4_example
└── lingflow/core (new)
Conclusion
This refactoring plan provides a structured approach to reducing file sizes while improving maintainability. The phased approach allows for incremental changes with minimal risk.
Next Steps: 1. Get approval from team lead 2. Create feature branches for each file 3. Begin with deployment-automation (P0) 4. Track progress in project board
Contact: For questions or clarifications, please open an issue or discussion.