π’ Chain of Density (CoD)
Text summarization has long been a challenging task in natural language processing, requiring a delicate balance between brevity and informativeness. Large language models (LLMs) have revolutionized this field by generating coherent summaries with minimal task-specific fine-tuning. However, creating summaries that are both concise and information-rich remains a significant challenge.
Chain of Density (CoD) prompting addresses this challenge through an innovative approach that systematically enriches summaries while maintaining a fixed length. This technique represents a significant advancement in how we can leverage LLMs for more effective summarization.
The key aspects of CoD include:
-
Iterative refinement: Instead of producing a summary in one pass, CoD prompting refines it over multiple iterations. In each step, the model identifies and integrates 1-3 missing salient entities, ensuring that previously omitted details are incorporated gradually.
-
Controlled compression: To maintain a constant token count, the model compresses or rephrases existing content to make space for new information. This strategy allows the summary to become more detailed without growing longer.
-
Enhanced abstraction: As the summary is repeatedly reworked, the language shifts toward a more abstract and synthesized narrative, effectively capturing the source material's essence.
The effectiveness of CoD comes from its ability to:
- Preserve a fixed length, preventing verbosity
- Fuse and reorganize content efficiently
- Balance entity density-ideally around 0.15 entities per token-to align with human preferences
The Mechanics of CoD
Chain of Density (CoD) prompting transforms summarization into a sophisticated, iterative refinement procedure. Here's how it works:
-
Generate an initial "entity-sparse" summary containing only 1-3 salient entities from the source text. The fixed token count is crucial for consistent evaluation of informativeness.
-
Through five fixed iterations, identify unique, missing entities that are:
- Relevant to the primary narrative
- Specific yet concise (five words or fewer)
- Novel to the existing summary
- Faithful to the original article
- Unrestricted in location within the article
- "Fuse" these missing entities into the existing summary through:
- Compression of redundant language
- Abstraction of content
- Integration of related details
Implementing CoD in Practice
The following template guides an LLM through the CoD process:

Template
Article: [ARTICLE]
You will generate increasingly concise, entity-dense summaries of the above Article. Repeat the following 2 steps 5 times.
Step 1. Identify 1-3 informative Entities (";" delimited) from the Article which are missing from the previously generated summary.
Step 2. Write a new, denser summary of identical length which covers every entity and detail from the previous summary plus the Missing Entities.
A Missing Entity is:
- Relevant: to the main story.
- Specific: descriptive yet concise (5 words or fewer).
- Novel: not in the previous summary.
- Faithful: present in the Article.
- Anywhere: located anywhere in the Article.
Guidelines:
- The first summary should be long (4-5 sentences, ~80 words) yet highly non-specific, containing little information beyond the entities marked as missing. Use overly verbose language and fillers (e.g., "this article discusses") to reach ~80 words.
- Make every word count: re-write the previous summary to improve flow and make space for additional entities.
- Make space with fusion, compression, and removal of uninformative phrases like "the article discusses".
- The summaries should become highly dense and concise yet self-contained, e.g., easily understood without the Article.
- Missing entities can appear anywhere in the new summary.
- Never drop entities from the previous summary. If space cannot be made, add fewer new entities.
Remember, use the exact same number of words for each summary.
Answer in JSON. The JSON should be a list (length 5) of dictionaries whose keys are "Missing_Entities" and "Denser_Summary".
Real-World Application: The Chinese Grand Prix Incident
To better understand how CoD works in practice, let's examine a concrete example from Formula 1 racing:
The summary evolves through five steps:
- Initial broad overview of the collision and consequences
- Addition of team and position details
- Integration of specific penalties and driver movements
- Incorporation of technical details (lap numbers, damage specifics)
- Final refinement while maintaining length
Human evaluations show that steps 3-4 typically achieve the optimal balance of informativeness and readability.
Comparative Analysis: CoD vs Traditional Methods
Method | Summary Characteristics | Strengths | Limitations |
---|---|---|---|
Vanilla Summarization | Single-step, general | Fast and straightforward | Misses crucial details |
Human-Written | Thoughtfully balanced | High readability, context-rich | Subjective and time-intensive |
CoD | Iteratively refined, dense | Incremental detail integration | Multiple processing steps |
Current Limitations and Future Potential
While CoD advances summarization technology, it faces several challenges:
- Domain specificity: Currently focused on news summarization
- Evaluation challenges: Subjective density standards
- Computational overhead: Multiple processing steps impact real-time performance
- Quality consistency: Varies with source material and LLM selection
Explore CoD further with the creators' dataset on Hugging Face, featuring 500 annotated and 5,000 unannotated examples.
Conclusion
Chain of Density Prompting advances automated summarization by combining iterative refinement with controlled information density. Its structured methodology provides a foundation for future developments in text summarization, offering a valuable tool for applications requiring high-quality, balanced summaries.
Valeriia Kuka
Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.
Footnotes
-
Adams, G., Fabbri, A., Ladhak, F., Lehman, E., & Elhadad, N. (2023). From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting. https://arxiv.org/abs/2309.04269 β©