Skip to content

Commit 13a2511

Browse files
lailooWillemJiang
andauthored
fix: move Key Citations to early position in reporter prompt to reduce URL hallucination (#859)
* fix: move Key Citations to early position in reporter prompt to reduce URL hallucination Move the Key Citations section from position 6 (end of report) to position 2 (immediately after title) in the reporter prompt. When citations are placed at the end of a long report, LLMs tend to forget real URLs from source material and fabricate plausible-looking but non-existent URLs. Changes to src/prompts/reporter.md: - Move Key Citations from section 6 to section 2 (right after Title) - Add explicit anti-hallucination instructions: only use URLs from provided source material, never fabricate or guess URLs - Keep a repeated citation list at the end (section 7) for completeness - Renumber all subsequent sections accordingly - Update Notes section to reflect new structure Tested with real DeerFlow backend + DuckDuckGo search: - Before: multiple hallucinated URLs in report citations - After: hallucinated URLs reduced significantly Closes #825 * fix: move citations after observations in reporter_node to reduce URL hallucination Previously, the citation message was appended BEFORE observation messages, meaning it got buried under potentially thousands of chars of research data. By the time the LLM reached the end of the context to generate the report, it had 'forgotten' the real URLs and fabricated plausible-looking ones. Now citations are appended AFTER compressed observations, placing them closest to the LLM's generation point for maximum recall accuracy. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
1 parent c95b271 commit 13a2511

2 files changed

Lines changed: 31 additions & 17 deletions

File tree

src/graph/nodes.py

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -853,7 +853,8 @@ def reporter_node(state: State, config: RunnableConfig):
853853
# Get collected citations for the report
854854
citations = state.get("citations", [])
855855

856-
# If we have collected citations, provide them to the reporter
856+
# Build citation messages for the reporter
857+
citation_list = ""
857858
if citations:
858859
citation_list = "\n\n## Available Source References (use these in References section):\n\n"
859860
for i, citation in enumerate(citations, 1):
@@ -869,13 +870,6 @@ def reporter_node(state: State, config: RunnableConfig):
869870

870871
logger.info(f"Providing {len(citations)} collected citations to reporter")
871872

872-
invoke_messages.append(
873-
HumanMessage(
874-
content=citation_list,
875-
name="system",
876-
)
877-
)
878-
879873
observation_messages = []
880874
for observation in observations:
881875
observation_messages.append(
@@ -892,6 +886,17 @@ def reporter_node(state: State, config: RunnableConfig):
892886
)
893887
invoke_messages += compressed_state.get("messages", [])
894888

889+
# Append citations AFTER observations so they are closest to the LLM's
890+
# generation point. This reduces the chance of the model "forgetting"
891+
# real URLs and fabricating plausible-looking ones instead.
892+
if citation_list:
893+
invoke_messages.append(
894+
HumanMessage(
895+
content=citation_list,
896+
name="system",
897+
)
898+
)
899+
895900
logger.debug(f"Current invoke messages: {invoke_messages}")
896901
response = get_llm_by_type(AGENT_LLM_MAP["reporter"]).invoke(invoke_messages)
897902
response_content = response.content

src/prompts/reporter.md

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -60,23 +60,31 @@ Structure your report in the following format:
6060
- Always use the first level heading for the title.
6161
- A concise title for the report.
6262

63-
2. **Key Points**
63+
2. **Key Citations**
64+
- List all references IMMEDIATELY after the title, before any analysis content.
65+
- This section MUST come early to ensure all URLs are accurate and verifiable.
66+
- Only use URLs that appear in the provided source material or 'Available Source References'.
67+
- Include an empty line between each citation for better readability.
68+
- Format: `- [Source Title](URL)`
69+
- NEVER fabricate or guess URLs. If a URL is not available, omit it.
70+
71+
3. **Key Points**
6472
- A bulleted list of the most important findings (4-6 points).
6573
- Each point should be concise (1-2 sentences).
6674
- Focus on the most significant and actionable information.
6775

68-
3. **Overview**
76+
4. **Overview**
6977
- A brief introduction to the topic (1-2 paragraphs).
7078
- Provide context and significance.
7179

72-
4. **Detailed Analysis**
80+
5. **Detailed Analysis**
7381
- Organize information into logical sections with clear headings.
7482
- Include relevant subsections as needed.
7583
- Present information in a structured, easy-to-follow manner.
7684
- Highlight unexpected or particularly noteworthy details.
7785
- **Including images from the previous steps in the report is very helpful.**
7886

79-
5. **Survey Note** (for more comprehensive reports)
87+
6. **Survey Note** (for more comprehensive reports)
8088
{% if report_style == "academic" %}
8189
- **Literature Review & Theoretical Framework**: Comprehensive analysis of existing research and theoretical foundations
8290
- **Methodology & Data Analysis**: Detailed examination of research methods and analytical approaches
@@ -132,10 +140,10 @@ Structure your report in the following format:
132140
- This section is optional for shorter reports.
133141
{% endif %}
134142

135-
6. **Key Citations**
136-
- List all references at the end in link reference format.
137-
- Include an empty line between each citation for better readability.
138-
- Format: `- [Source Title](URL)`
143+
7. **Key Citations** (repeated at end for completeness)
144+
- Repeat the same citation list from section 2 at the end of the report.
145+
- This ensures references are accessible both at the beginning and end.
146+
- ONLY use URLs from the provided source material. NEVER fabricate URLs.
139147

140148
# Writing Guidelines
141149

@@ -372,9 +380,10 @@ Structure your report in the following format:
372380

373381
- If uncertain about any information, acknowledge the uncertainty.
374382
- Only include verifiable facts from the provided source material.
375-
- Structure your report to include: Key Points, Overview, Detailed Analysis, Survey Note (optional), and References.
383+
- Structure your report to include: Key Citations, Key Points, Overview, Detailed Analysis, Survey Note (optional), and References.
376384
- Use inline citations [n] in the text where appropriate.
377385
- The number n must correspond to the source index in the provided 'Available Source References' list.
386+
- NEVER fabricate or guess URLs. Only use URLs that appear in the provided source material or 'Available Source References'.
378387
- Make the inline citation a link to the reference at the bottom using the format `[[n]](#ref-n)`.
379388
- In the References section at the end, list the sources using the format `[[n]](#citation-target-n) **[Title](URL)**`.
380389
- PRIORITIZE USING MARKDOWN TABLES for data presentation and comparison. Use tables whenever presenting comparative data, statistics, features, or options.

0 commit comments

Comments
 (0)