跳转至

InfoRadar-v5 Pipeline Verification

This document provides the acceptance test commands as specified in the claudecode-spec.

Prerequisites

  1. Environment Setup:

    # Copy and configure .env
    cp .env.example .env
    # Edit .env and add your OPENAI_API_KEY
    

  2. Install Dependencies:

    # Python dependencies
    pip3 install -r requirements.txt
    
    # Node.js dependencies (for add-report.js)
    cd web && npm install && cd ..
    

Acceptance Tests

Test A: Ingest Command

python -m pipeline.main ingest

Expected behavior: - Fetches articles from RSS feeds and static URLs - Filters by date (max_age_days=3) - Deduplicates by URL (seen_urls + SQLite) - Scores articles (two-round scoring with smoothing) - Generates summaries (~400 Chinese chars) - Saves to SQLite database and article markdown files - Output: data/articles/{date}/{grade}_{score}_{title}.md

Success criteria: - No errors during execution - Articles saved to data/radar.db - Article files created in data/articles/{date}/

Test B: Report Command

python -m pipeline.main report --date 2026-02-10

Expected behavior: - Reads articles from database for the specified date - Performs embedding deduplication (cosine >= 0.8) - Generates summary file: data/articles/summary/2026-02-10_S.md - Makes single smart LLM call to generate full report - Creates report: data/dailyReport/industry_radar_2026-02-10.md

Success criteria: - Report file created at: data/dailyReport/industry_radar_2026-02-10.md - Report contains Chinese content with proper structure - Summary file created at: data/articles/summary/2026-02-10_S.md

Test C: Add Report Script

node scripts/add-report.js data/dailyReport/industry_radar_2026-02-10.md --verbose --yes

Expected behavior: - Copies report from data/dailyReport/ to output/{date}.md - Generates static website (runs web/generate.js) - Executes hooks/post_gen.sh (failure is warning only) - Stages and commits changes to git - Attempts to push to remote (failure allowed but prints manual steps)

Success criteria: - Report copied to output/2026-02-10.md - Website generated in web/dist/ - Hook executed (or warning shown) - Git commit created - Clear manual push instructions shown if push fails

Additional Verification

Check Database

sqlite3 data/radar.db "SELECT COUNT(*) FROM articles;"
sqlite3 data/radar.db "SELECT url, title, final_score, grade FROM articles LIMIT 5;"

Check File Structure

# Check article files
ls -la data/articles/$(date +%Y-%m-%d)/

# Check daily reports
ls -la data/dailyReport/

# Check summary files
ls -la data/articles/summary/

Verify No Mocks

# Ensure radar.py is deleted
test ! -f pipeline/radar.py && echo "✓ No radar.py mock" || echo "✗ radar.py still exists"

# Check for MOCK_NEWS_ITEMS or random.sample in main path
grep -r "MOCK_NEWS_ITEMS\|random\.sample" pipeline/*.py pipeline/core/*.py && echo "✗ Found mocks" || echo "✓ No mocks found"

Cron Schedule

The pipeline is designed to run on the following schedule (Asia/Shanghai timezone):

# Hourly ingest at :00 minutes
0 * * * * cd /path/to/daily-report && TZ=Asia/Shanghai bash scripts/run-pipeline.sh ingest >> logs/cron-ingest.log 2>&1

# Daily report at 23:15 (generates report for TODAY)
50 23 * * * cd /path/to/daily-report && TZ=Asia/Shanghai bash scripts/run-pipeline.sh report >> logs/cron-report.log 2>&1

To install:

# Edit crontab.txt with your actual path
# Then install:
crontab crontab.txt

Troubleshooting

Missing API Key

Error: OPENAI_API_KEY not set
Solution: Configure .env file with valid API key

No Articles Found

[Ingest] No articles found
Solution: Check RSS feeds are accessible, verify date filter (max_age_days)

Report Generation Failed

[Report] No articles found for {date}
Solution: Run ingest first, or check if articles exist for that date in database

Git Push Failed

Git operation failed: ...
Solution: This is expected if remote is not configured. Follow manual push instructions printed by the script.

Implementation Notes

  • No mocks: All code uses real RSS feeds, real LLM calls, real database
  • Output path: Reports are generated in data/dailyReport/, NOT output/
  • Grade thresholds: A >= 80, B >= 60, C >= 51
  • Smoothing logic: Applied when first_score >= 70 but second_score < 50
  • Embedding dedupe: Cosine similarity threshold >= 0.8
  • Summary length: ~400 Chinese characters
  • Time filter: max_age_days = 3