In a 2023 episode of the CIA podcast The Langley Files, Deputy Director for Analysis Linda Weissgold said the agency already uses AI to help analysts stay on top of an "ever expanding amount of data." However, she emphasized that she would not present intelligence to the president if the only justification was that "the black box just told me so." Her comments drew a boundary between using AI as an assistant and relying on opaque systems for the most sensitive judgments.

That boundary reflects a long-standing requirement in U.S. intelligence analysis. Institutions must be able to explain why they believe what they believe, how strongly they hold those views, and what would change their minds. Public doctrine describes this as a matter of analytic standards, evidentiary discipline, and accountability to decision-makers rather than to tools.

Meeting that standard at scale forces organizations to think concretely about synthesis methodology. They must consider the limitations of generic chatbots and the role deterministic rule systems can play in automated workflows. Decision tables, already proven in large public-benefit programs, offer one way to make complex logic transparent while still benefiting from modern machine-learning components.

Key Findings


  • Institutional synthesis in the U.S. Intelligence Community requires transparent reasoning, clear uncertainty statements, and traceable sourcing, as codified in ODNI’s ICD 203 analytic standards.
  • CIA, DIA, NSA, NGA, INR, FBI, and DHS I&A apply those standards to different missions and data types, which leads organizations to weigh sources and precedent differently even on shared information.
  • CIA has stated publicly that presidential-level assessments must be able to explain why they reach a judgment, highlighting the limits of out-of-the-box black-box chatbots for institutional-grade synthesis.
  • NIST’s Generative AI Profile describes confabulation, unreliable citations, and weak provenance as core risks, underscoring why generic LLM outputs are hard to audit against IC tradecraft standards.
  • Decision-table engines such as DTRules show that complex public policy can run through deterministic, spreadsheet-defined rules that auditors and policy staff can review and test at scale.
  • Quality-controlled generative pipelines that embed LLMs inside decision-table guardrails, provenance logging, and human review can align automation with IC, DoD, and NIST expectations across intelligence and other domains.

Synthesis as Institutional Tradecraft


Within the U.S. Intelligence Community, synthesis is more than summarizing documents. It is the disciplined integration of incomplete and sometimes contradictory information into judgments that can be scrutinized, revised, and audited. The Office of the Director of National Intelligence’s analytic standards directive, ICD 203, requires that finished analysis be objective, based on all available sources, and explicit about uncertainties and assumptions.

ICD 203 also defines tradecraft standards that matter directly for automation. Analysts are expected to describe source quality, distinguish evidence from judgment, assess alternative hypotheses, and explain how new information changes previous views. Those requirements are not just stylistic; they define what counts as defensible institutional reasoning when evidence is patchy or contested.

CIA’s Directorate of Analysis describes its officers as deciphering and synthesizing incomplete and sometimes contradictory information to provide timely, objective intelligence for senior U.S. officials. According to the Directorate’s public overview on CIA.gov, analysts are expected to anticipate developments, integrate multiple collection streams, and present clear written and visual judgments to policymakers.

Structured analytic techniques are one way the community tries to make this synthesis explicit. The CIA’s Tradecraft Primer outlines techniques such as the Key Assumptions Check, which asks analysts to list and challenge the premises that support their main judgments. The goal is to identify what evidence would overturn those assumptions, exposing hidden premises and weak logic before products reach senior readers.

At the community level, intelligence.gov’s description of National Intelligence Estimates shows synthesis as a coordination problem as much as an analytic one. The National Intelligence Council contacts all intelligence agencies for input, convenes representatives, and produces an estimate that reflects both shared judgments and recorded differences where they persist. That workflow makes clear that institutional synthesis must show where views diverge, not simply blend them.

More Technology Articles

Mission-Driven Synthesis Across the Intelligence Community


Public documents do not lay out each agency’s internal workflows in detail, but they do show how mission, data, and customer sets shape synthesis. CIA’s Directorate of Analysis focuses on all-source analysis for senior foreign and national security policymakers, integrating human reporting, technical collection, and open sources into narrative assessments and daily briefings.

The Defense Intelligence Agency’s mission statement on DIA.mil highlights support to warfighters, defense policymakers, and force planners, with a core mission to provide intelligence on foreign militaries to prevent and decisively win wars. That orientation biases synthesis toward capabilities, order of battle, and operational implications, often under tight timelines and with close coupling to planning.

Signals intelligence and geospatial intelligence agencies start from different evidentiary baselines. The National Security Agency explains that it provides foreign SIGINT to policymakers and military forces, drawing on electronic signals and systems used by foreign targets. The National Geospatial-Intelligence Agency states that it delivers world-class geospatial intelligence to policymakers, military service members, and first responders, emphasizing imagery, mapping, and change detection as primary inputs.

The National Reconnaissance Office focuses on designing, building, and operating intelligence satellites rather than owning most downstream narrative analysis. Upstream choices about sensors, revisit rates, and metadata shape what later all-source synthesis can say and how quickly it can respond. In that sense, collection architecture is an implicit part of synthesis methodology even when not labeled as such.

The State Department’s Bureau of Intelligence and Research, or INR, describes its primary mission as harnessing intelligence to serve U.S. diplomacy and providing value-added independent analysis to State Department policymakers, drawing on all-source intelligence. As summarized on Intelligence.gov, INR also reviews intelligence activities for consistency with foreign policy and coordinates analytic outreach to external experts.

Domestic-facing elements add further constraints. The FBI’s public intelligence page notes that the Bureau uses intelligence to drive its decision-making while operating under attorney general guidelines and procedures designed to protect privacy and civil liberties. The Department of Homeland Security’s Office of Intelligence and Analysis, according to DHS.gov, is the only IC element statutorily charged with delivering intelligence to state, local, tribal, territorial, and private-sector partners.

Those public missions help explain why, as Weissgold recounted, different organizations can review the same information yet reach different conclusions. Agencies vary in which sources they prioritize, how they treat historical precedent, and which customers they serve. Synthesis methodology in practice is partly about making those differences explicit and defensible rather than allowing them to remain hidden in institutional culture.

Why Generic LLMs Struggle With Institutional-Grade Synthesis


In The Langley Files conversation, Weissgold described AI as useful for keeping up with data volume and brainstorming but drew a clear line at accountable explanation. She argued that what separates CIA analysis from punditry is the ability to explain why analysts think what they think, and said that a president should not accept answers that amount to a black box verdict without transparent reasoning.

That position aligns closely with ICD 203’s emphasis on describing source quality, explaining uncertainty, and distinguishing evidence from assumptions. An out-of-the-box large language model can generate fluent text, but its internal computation is not designed to expose which sources it relied on, how it weighed conflicting evidence, or why it chose one interpretation over another. Even when a model outputs a step-by-step explanation, that narrative may not match the actual internal process.

The National Institute of Standards and Technology’s Generative AI Profile, part of the AI Risk Management Framework, formalizes some of these concerns. NIST defines "confabulation" as a phenomenon in which generative models confidently produce erroneous or fabricated content, including invented logic and citations. The profile warns that users may act on such outputs when they appear plausible and recommends practices such as reviewing and verifying sources and citations.

NIST also highlights risks around information integrity, automation bias, and content provenance. In institutional settings that rely on documented chains of reasoning and clear uncertainty language, these risks map directly to analytic tradecraft concerns. A system that can misattribute sources or hide gaps in its evidence base conflicts with expectations that judgments be grounded, contestable, and auditable.

As a result, the main shortcoming of generic LLMs for institutional-grade synthesis is not only that they sometimes get facts wrong. It is that their errors, and even some of their correct outputs, are difficult to align with the explicit standards, review processes, and documentation practices that organizations like the CIA or the Defense Department have already published.

Decision Tables in Practice: DTRules and Policy Execution


Decision tables describe logic as condition-action matrices, typically in a spreadsheet where each column is a rule and each row is a condition or outcome. The open-source DTRules engine, documented on DTRules.com, lets policy experts and developers define entities, attributes, and rules in Excel, then compile them into a runtime engine for Java or Go. Each rule remains visible as a table entry that can be reviewed and tested.

In biographical material submitted to the U.S. House of Representatives in 2016, Paul Snow explained that he developed a decision-table-based rules engine in 2000 and 2001 to execute the State of Texas’s eligibility determination process for health and human services programs. That engine, now known as DTRules, executes more than 3,000 decision tables in the Texas Integrated Eligibility Redesign System, which determines eligibility for Medicaid, SNAP, TANF, and related programs.

Snow’s materials also note deployments beyond Texas, including eligibility determination in Michigan, corporate audits in Ohio, and Medicare provider assignment in several states. Earlier Beige reporting has emphasized that these systems allow agencies to map complex policy directly into tables that auditors and program staff can inspect, with rule-level logging that supports appeals and federal reviews.

From a governance perspective, the key point is determinism. Given the same inputs and versioned tables, the engine will produce the same outputs every time, and those outputs can be traced back to specific columns and rows. Policy changes are made by updating table entries, validating them against test cases, and recording the new version.

That structure aligns closely with requirements in high-stakes environments where decisions must be reproducible years later and where external overseers may need to see how specific inputs led to particular outcomes. It also illustrates that large-scale, rules-as-code implementations are not theoretical; they have been in production for years in domains where fairness, consistency, and auditability are central.

Guardrailed Generative Pipelines for Institutional Synthesis


Decision tables do not replace machine learning, but they can define the guardrails within which learning systems operate. A common pattern in both commercial and defense contexts is to constrain machine learning to perception, ranking, or recommendation, then route its outputs through a deterministic layer of decision logic that encodes thresholds, escalation paths, and final actions.

The Object Management Group’s Decision Model and Notation standard, described on OMG.org, formalizes this idea as diagrams and unambiguous decision tables that business and technical stakeholders can read. DMN models are designed to be both human-interpretable and machine-executable, which makes them suitable as the explicit policy layer in an automated pipeline.

In a quality-controlled generative pipeline, an LLM might help draft text, suggest alternative scenarios, or cluster related reports. Decision tables then enforce institutional rules about what can be asserted and how. For example, a table could require that any statement about a foreign actor’s intent include a probability range and confidence level consistent with ICD 203, backed by at least one cited source from a defined set of collections.

NIST’s Generative AI Profile offers a catalog of concrete controls that such pipelines can implement. It recommends reviewing and verifying sources and citations in outputs, tracking and documenting training and evaluation data provenance, and monitoring how often guardrails are updated and how effective they remain. Those recommendations can be translated into decision-table rules that gate dissemination or trigger human review.

For defense applications, DoD Directive 3000.09 on autonomy in weapon systems illustrates how demanding such guardrails may need to be. The directive requires rigorous verification and validation, realistic test and evaluation, and designs that allow commanders and operators to exercise appropriate human judgment over the use of force in systems that incorporate AI. In that environment, using deterministic decision logic to encode engagement rules can focus the most formal assurance work on a legible policy layer.

A similar pattern can support intelligence and civilian uses. Institutions can treat generative models as tools for search, summarization, and drafting, while insisting that claims leaving the system pass through explicit tables that encode analytic standards, sourcing expectations, and uncertainty language. Logs from both the model and the rules engine then feed into provenance records that support after-action review.

Institutional Choices About Traceability


Public IC doctrine and leadership statements converge on a simple requirement: high-stakes analysis must be able to explain why it reaches a judgment and what evidence could change that view. Generic chatbots, even when powerful, do not yet make their internal tradeoffs legible in ways that match analytic standards such as ICD 203 or address the confabulation and provenance issues described by NIST.

Decision-table engines like DTRules, combined with standards such as DMN and the controls in the NIST AI Risk Management Framework, show that it is possible to encode complex policy logic in forms that are both readable and executable. When those deterministic layers sit around generative components, they can turn black-box text generation into one component inside a traceable synthesis pipeline rather than the final authority.

For intelligence agencies, regulators, and companies facing similar pressures, the choice is less about whether to use generative AI at all and more about how to embed it. Systems that keep reasoning and governance legible to humans while harnessing machine-scale pattern recognition are better positioned to meet the standard Weissgold described: being able to explain why they think what they think, even when algorithms assist along the way.

Sources


Article Credits