
Automated content lifecycle from data collection through analysis, ideation, generation, and publication. Dockerised service stack: FastAPI application, Celery task workers, Celery Beat scheduler, Flower monitoring dashboard, and Redis message broker. Ingests data from multiple sources — RSS feeds, web scraping, crawled content — and breaks it down into atomic concepts and facts that populate a continuously growing knowledge graph.
Ideas emerge from the knowledge graph when concept density reaches thresholds — clusters of related facts that haven't been covered yet surface as candidate topics. Ideas mature over time as more supporting data arrives. When an idea reaches sufficient novelty and factual completeness, the system generates an article grounded in the collected facts with proper attribution. The article is then evaluated for completeness: published if it meets quality thresholds, held as a draft with specific gaps identified if it needs more supporting data.
Published articles feed back into the knowledge graph, expanding its concept base and creating new connections between existing nodes. The system's understanding deepens with each production cycle — new articles introduce new entity relationships, topic connections, and factual context that inform subsequent ideation. Each cycle makes the next cycle's output more informed and better connected.
NLP pipeline handles entity extraction (spaCy), topic modelling (BERTopic), keyword extraction (KeyBERT), sentiment analysis (VADER, TextBlob), and semantic embedding for vector search (Weaviate). Content scraping via Playwright and BeautifulSoup for JavaScript-rendered and static pages respectively. Domain-agnostic architecture — the knowledge graph structure, ideation logic, and generation pipeline work for any subject domain.