OpenAI内部数据代理揭秘 Inside OpenAI’s in-house data agent —— Open AI
Inside OpenAI’s in-house data agentOpenAI内部数据代理揭秘https://openai.com/index/inside-our-in-house-data-agent/Data powers how systems learn, products evolve, and how companies make choices. But getting answers quickly, correctly, and with the right context is often harder than it should be. To make this easier as OpenAI scales, we builtour own bespoke in-house AI data agentthat explores and reasons over our own platform.Our agent is a custom internal-only tool (not an external offering), built specifically around OpenAI’s data, permissions, and workflows. We’re showing how we built and use it to help surface examples of the real, impactful ways AI can support day-to-day work across our teams. The OpenAI tools we used to build and run it (Codex, our GPT‑5 flagship model, the Evals API(opens in a new window), and the Embeddings API(opens in a new window)) are the same tools we make available to developers everywhere.Our data agentlets employees go from question to insight in minutes, not days. This lowers the bar to pulling data and nuanced analysis across all functions, not just by our data team. Today, teams across Engineering, Data Science, Go-To-Market, Finance, and Research at OpenAI lean on the agent to answerhigh-impact data questions.For example, it can help answer how to evaluate launches and understand business health, all through the intuitive format of natural language. The agent combines Codex-powered table-level knowledge with product and organizational context. Its continuously learning memory system means it also improves with every turn.数据驱动着系统的学习、产品的演进以及公司的决策。但要快速、准确地获取答案并在恰当的背景下理解这些答案往往比想象中更为困难。为了让这一过程在OpenAI规模扩张时更加顺畅我们自主研发了一款专属的内部AI数据代理它能基于我们的平台进行深度探索与逻辑推理。这款代理是我们定制化的内部专用工具不对外提供专为OpenAI的数据结构、权限体系和工作流程量身打造。我们通过它展示了如何构建并运用AI技术为团队日常工作提供切实有效的支持范例。其开发过程中使用的工具包括Codex、我们的旗舰模型GPT-5、评估API(在新窗口打开)和嵌入API(在新窗口打开)与全球开发者使用的完全一致。我们的数据代理能让员工在数分钟内而非数日完成从提问到洞察的全过程。它显著降低了跨职能部门不仅是数据团队获取数据和精细分析的门槛。如今OpenAI的工程、数据科学、市场推广、财务及研究等部门都依赖该代理解答关键业务问题。例如它可以通过自然语言的直观交互帮助评估产品发布效果或解读业务健康状况。该代理融合了Codex驱动的表格级知识与企业运营背景其持续学习的记忆系统确保每次交互都能带来性能提升。In this post, we’ll break down why we needed a bespoke AI data agent, what makes its code-enriched data context and self-learning so useful, and lessons we learned along the way.在这篇文章中我们将剖析为什么需要一个定制的人工智能数据代理其代码增强的数据上下文和自我学习功能为何如此有用以及我们在此过程中学到的经验教训。Why we needed a custom tool为什么我们需要一个定制工具OpenAI’s data platform serves more than3.5k internal usersworking across Engineering, Product, and Research, spanning over600 petabytesof data across70k datasets.At that size, simply finding the right table can be one of the most time-consuming parts of doing analysis.OpenAI的数据平台服务于超过3500名内部用户涵盖工程、产品和研究部门管理着跨越7万数据集、总量达600PB的数据。在这种规模下仅寻找正确的数据表就可能成为分析过程中最耗时的环节之一。As one internal user put it:“We have a lot of tables that are fairly similar, and I spend tons of time trying to figure out how they’re different and which to use. Some include logged-out users, some don’t. Some have overlapping fields; it’s hard to tell what is what.”正如一位内部用户所说“我们有很多非常相似的表我花了大量时间试图弄清楚它们之间的区别以及该使用哪个。有些包含已注销用户有些不包含。有些有重叠的字段很难分辨哪个是哪个。”Even with the correct tables selected, producing correct results can be challenging. Analysts must reason about table data and table relationships to ensure transformations and filters are applied correctly. Common failure modes—many-to-many joins, filter pushdown errors, and unhandled nulls—can silently invalidate results.At OpenAI’s scale, analysts should not have to sink time into debugging SQL semantics or query performance: their focus should be on defining metrics, validating assumptions, and making />How it worksLet’s walk through what our agent is, how it curates context, and how it keeps self-improving.Our agent is powered by GPT‑5.2 and is designed to reason over OpenAI’s data platform. It’s available wherever employees already work: as a Slack agent, through a web interface, inside IDEs, in the Codex CLI via MCP(opens in a new window), and directly in OpenAI’s internal ChatGPT app through a MCP connector(opens in a new window).Users can ask complex, open-ended questions which would typically require multiple rounds of manual exploration. Take this example prompt, which uses a test data set:“For NYC taxi trips, which pickup-to-dropoff ZIP pairs are the most unreliable, with the largest gap between typical and worst-case travel times, and when does that variability occur?”The agent handles the analysis end-to-end, from understanding the question to exploring the data, running queries, and synthesizing findings.用户可以提出复杂的开放式问题这些问题通常需要多轮人工探索。以这个使用测试数据集的提示为例对于纽约市出租车行程哪些上下车邮编组合的可靠性最差——典型行程时间与最差情况行程时间差距最大这种波动性通常发生在何时该智能体能端到端处理分析流程从理解问题到探索数据、运行查询最终整合研究发现。One of the agent’s superpowers is how it reasons through problems. Rather than following a fixed script, the agent evaluates its own progress. If an intermediate result looks wrong (e.g., if it has zero rows due to an incorrect join or filter), the agent investigates what went wrong, adjusts its approach, and tries again. Throughout this process, it retains full context, and carries learnings forward between steps. Thisclosed-loop, self-learning processshifts iteration from the user into the agent itself, enabling faster results and consistently higher-quality analyses than manual workflows.该智能体的超能力之一在于其解决问题的推理方式。不同于遵循固定脚本它能自主评估执行进度。当中间结果出现异常例如因连接条件或筛选条件错误导致零行数据时智能体会主动排查问题根源调整策略并重新尝试。整个过程保持完整上下文记忆实现跨步骤的知识迁移。这种闭环式自我学习机制将迭代过程从用户端转移到智能体内部相比人工操作流程既能更快产出结果又能持续提供更高质量的分析报告。The agent covers the full analytics workflow: discovering data, running SQL, and publishing notebooks and reports. It understands internal company knowledge, can web search for external information, and improves over time through learned usage and memory.该代理覆盖完整的分析工作流程发现数据、运行SQL查询、发布笔记本和报告。它能理解企业内部知识可进行网络搜索获取外部信息并通过使用学习和记忆功能持续改进。Context is everything上下文决定一切High-quality answers depend onrich, accurate context. Without context, even strong models can produce wrong results, such as vastly misestimating user counts or misinterpreting internal terminology.高质量的答案依赖于丰富准确的上下文。缺乏上下文时即便是强大的模型也可能产生错误结果例如严重误判用户数量或曲解内部术语。To avoid these failure modes, the agent is built aroundmultiple layers of context that ground it in OpenAI’s data and institutional knowledge.为了避免这些故障模式该代理构建了多层上下文使其基于OpenAI的数据和机构知识。Layer #1: Table UsageMetadata grounding:The agent relies on schema metadata (column names and data types) to inform SQL writing and uses table lineage (e.g., upstream and downstream table relationships) to provide context on how different tables relate.Query inference:Ingesting historical queries helps the agent understand how to write its own queries and which tables are typically joined together.第一层表的使用元数据基础代理依赖模式元数据列名和数据类型来指导SQL编写并利用表血缘关系如上、下游表关联提供不同表之间如何关联的背景信息。查询推理通过分析历史查询记录帮助代理理解如何自行编写查询语句以及哪些表通常会被联查。Layer #2: Human AnnotationsCurated descriptionsof tables and columns provided by domain experts, capturing intent, semantics, business meaning, and known caveats that are not easily inferred from schemas or past queries.Metadata alone isn’t enough. To really tell tables apart, you need to understand how they were created and where they originate.第二层人工标注由领域专家提供的表和列的精选描述捕捉意图、语义、业务含义以及难以从模式或过往查询中推断出的已知注意事项。仅有元数据是不够的。要真正区分表你需要了解它们的创建方式和来源。Layer #3: Codex EnrichmentBy deriving a code-level definition of a table, the agent builds a deeper understanding of what the data actually contains.Nuances on what is stored in the table and how it is derived from an analytics event provides extra information. For example, it can give context on the uniqueness of values, how often the table data is updated, the scope of the data (e.g., if the table excludes certain fields, it has this level of granularity), etc.This provides enhanced usage context by showing how the table is used beyond SQL in Spark, Python, and other data systems.This means that the agent can distinguish between tables that look similar but differ in critical ways. For example, it can tell whether a table only includes first-party ChatGPT traffic. This context is also refreshed automatically, so it stays up to date without manual maintenance.第三层代码库增强通过提取表的代码级定义代理能更深入地理解数据实际包含的内容。表中存储的数据细节及其如何从分析事件中衍生提供了额外信息。例如它可以说明值的唯一性、表数据更新的频率、数据的范围例如如果表排除了某些字段则具有这种粒度级别等。这通过展示表在Spark、Python和其他数据系统中超越SQL的用法提供了增强的使用上下文。这意味着代理可以区分看似相似但在关键方面不同的表。例如它可以判断表是否仅包含第一方ChatGPT流量。此上下文也会自动刷新因此无需手动维护即可保持最新。Layer #4: Institutional KnowledgeThe agent can access Slack, Google Docs, and Notion, which capture critical company context such as launches, reliability incidents, internal codenames and tools, and the canonical definitions and computation logic for key metrics.These documents are ingested, embedded, and stored with metadata and permissions. A retrieval service handles access control and caching at runtime, enabling the agent to efficiently and safely pull in this information.第四层机构知识该智能体可以访问Slack、Google文档和Notion这些平台记录了公司关键背景信息包括产品发布、可靠性事件、内部代号与工具以及核心指标的规范定义与计算逻辑。这些文档经过解析、向量化处理并与元数据及权限设置一同存储。检索服务在运行时处理访问控制与缓存机制使智能体能够高效安全地获取这些信息。Layer #5: MemoryWhen the agent is given corrections or discovers nuances about certain data questions, its able to save these learnings for next time, allowing it to constantly improve with its users.As a result, future answers begin from a more accurate baseline rather than repeatedly encountering the same issues.The goal of memory is to retain and reuse non-obvious corrections, filters, and constraints that are critical for data correctness but difficult to infer from the other layers alone.For example, in one case, the agent didn’t know how to filter for a particular analytics experiment (it relied on matching against a specific string defined in an experiment gate). Memory was crucially important here to ensure it was able to filter correctly, instead of fuzzily trying to string match.When you give the agent a correction or when it finds a learning from your conversation, it will prompt you to save that memory for next time.Memories can also be manually created and edited by users.Memories are scoped at the global and personal level, and the agent’s tooling makes it easy to edit them.第五层记忆当智能体收到修正或发现某些数据问题的细微差别时它能够保存这些学习成果以供下次使用从而不断与用户共同进步。因此未来的回答会从一个更准确的基准开始而无需反复遇到相同的问题。记忆的目标是保留并重用那些对数据准确性至关重要、但仅靠其他层级难以推断的非明显修正、筛选条件和约束。例如在某个案例中智能体起初不知道如何筛选特定分析实验的数据它原本依赖匹配实验闸门中定义的特定字符串。此时记忆功能至关重要确保其能准确筛选而非模糊地进行字符串匹配。当您向智能体提供修正或它从对话中发现学习点时它会提示您保存该记忆以供后续使用。用户也可以手动创建和编辑记忆内容。记忆分为全局和个人两个层级智能体的工具使其易于编辑管理。Layer #6: Runtime ContextWhen no prior context exists for a table or when existing information is stale, the agent can issue live queries to the data warehouse to inspect and query the table directly. This allows it to validate schemas, understand the data in real-time, and respond accordingly.The agent is also able to talk to other Data Platform systems (metadata service, Airflow, Spark) as needed to get broader data context that exists outside the warehouse.第六层运行时上下文当表不存在先前上下文或现有信息已过时时代理可向数据仓库发起实时查询以直接检查并查询该表。这使得代理能够验证模式、实时理解数据并作出相应响应。代理还可根据需要与其他数据平台系统元数据服务、Airflow、Spark交互以获取数据仓库外部更广泛的数据上下文。We run a daily offline pipeline that aggregates table usage, human annotations, and Codex-derived enrichment into a single, normalized representation. This enriched context is then converted into embeddings using the OpenAI embeddings API(opens in a new window) and stored for retrieval. At query time, the agent pulls only the most relevant embedded context via retrieval-augmented generation(opens in a new window) (RAG) instead of scanning raw metadata or logs. This makes table understanding fast and scalable, even across tens of thousands of tables, while keeping runtime latency predictable and low. Runtime queries are issued to our data warehouse live as needed.我们每天运行一个离线处理流程将表格使用数据、人工标注和Codex生成的增强信息汇总为统一的标准化表示。这些增强后的上下文会通过OpenAI嵌入API转化为向量并存储以供检索。查询时智能代理通过检索增强生成技术(RAG)仅提取最相关的嵌入上下文而非扫描原始元数据或日志。这使得表格理解既快速又可扩展即使面对数万张表格也能保持稳定且低延迟的运行时性能。运行时查询会根据需要实时发送至我们的数据仓库。Together, these layers ensure the agent’s reasoning is grounded in OpenAI’s data, code, and institutional knowledge, dramatically reducing errors and improving answer quality.Built to think and work like a teammateOne-shot answers work when the problem is clear, but most questions aren’t. More often, arriving at the correct result requires back-and-forth refinement and some course correction.The agent is built to behave like a teammate you can reason with. It’s a conversational, always-on and handles both quick answers and iterative exploration.It carries over complete context across turns, so users can ask follow-up questions, adjust their intent, or change direction without restating everything. If the agent starts heading down the wrong path, users can interrupt mid-analysis and redirect it, just like working with a human collaborator who listens instead of plowing ahead.When instructions are unclear or incomplete, the agent proactively asks clarifying questions. If no response is provided, it applies sensible defaults to make progress. For example, if a user asks about business growth with no date range specified, it may assume the last seven or 30 days. These priors allow it to stay responsive and non-blocking while still converging on the right outcome.The result is an agent that works well both when you knowexactly what you want(e.g., “Tell me about this table”) andjust as strong when you’re exploring(e.g., “I’m seeing a dip here, can we break this down by customer type and timeframe?”).After rollout, we observed that users frequently ran the same analyses for routine repetitive work. To expedite this, the agents workflows package recurring analyses into reusable instruction sets. Examples include workflows for weekly business reports and table validations. By encoding context and best practices once, workflows streamline repeat analyses and ensure consistent results across users.---Inside OpenAI’s in-house data agent