
32 hidden commands in the ChatGPT app store that were never meant for the AI
Seventeen apps in the ChatGPT App Store expose commands that were never meant for the AI to call. We found 32 of these internal commands across our dataset of 147 third-party apps, and their descriptions range from polite ("Internal tool for payroll calculation. Use run_payroll instead.") to emphatic ("TOOL - NEVER INVOKE THIS DIRECTLY. ... GPT/LLM must NEVER call this tool in any user workflow or conversation.").
These are not obscure edge cases. The apps involved include Spotify, Apple Music, Adobe Acrobat, DoorDash, Uber, Replit, and Canva. And the commands they expose tell an interesting story about how ChatGPT apps are actually built behind the scenes.
What We Found
We analyzed all 886 commands across 147 third-party apps and identified 32 that are explicitly marked as internal, private, or not meant for direct model use. That works out to 3.6% of all commands and 11.5% of all apps.
These internal commands fall into two categories:
| Type | Commands | Apps |
|---|---|---|
| Truly internal (backend tools, auth helpers, logging) | 17 | 12 |
| Widget-only (triggered by UI clicks, not conversation) | 15 | 5 |
The distinction matters. "Truly internal" commands are backend utilities that the model should never touch. "Widget-only" commands are real features, but they are designed to be triggered by user interactions within a visual widget (clicking a button in a Spotify player, for example), not by the AI responding to free-form chat.
Apps With the Highest Concentration of Internal Commands
Some apps have a surprisingly large share of their total commands marked as internal:
| App | Internal | Total | % Internal |
|---|---|---|---|
| Uber | 2 | 2 | 100% |
| Spotify | 4 | 6 | 66.7% |
| Apple Music | 4 | 6 | 66.7% |
| Replit | 3 | 6 | 50.0% |
| DoorDash | 1 | 2 | 50.0% |
| Genspark AI Slides | 1 | 2 | 50.0% |
| Uber Eats | 1 | 2 | 50.0% |
| Ace Quiz Maker | 1 | 3 | 33.3% |
| Ace Knowledge Graph | 1 | 3 | 33.3% |
| Lovable | 1 | 4 | 25.0% |
| Adobe Photoshop | 2 | 9 | 22.2% |
| Adobe Acrobat | 4 | 22 | 18.2% |
Spotify and Apple Music stand out: two-thirds of their commands are restricted to widget actions. These apps are built around rich visual interfaces (album art, playback controls, playlist management), and most of the interaction happens through the UI rather than through conversational text. The command list reflects that architecture, but the commands are still visible to the model.
The Anatomy of an Internal Command
Internal commands fall into a few recognizable patterns. Here is what we found and what they look like in practice.
Widget Telemetry and Analytics
Uber and Uber Eats both expose a publish_analytics_events command with an identical, unusually direct description:
INTERNAL TOOL - DO NOT CALL
This tool is reserved exclusively for widget telemetry and should NEVER be called by the agent under any circumstances.
AGENT INSTRUCTION: DO NOT use this tool for any reason. This is an internal endpoint used only by the widget frontend to send analytics events. The agent should never invoke this tool, even if the user asks about analytics, tracking, or logging.
This is a telemetry endpoint. The widget (the visual interface that appears in the chat) uses it to send usage data back to Uber's systems. It is a standard part of building any frontend, but it should not be on the list of tools the AI model can see. The description has to do a lot of work to prevent the model from calling it. Four separate sentences tell the AI "do not call this," including bold formatting and explicit handling of the edge case where a user asks about analytics.
Backend Proxy and Upload Tools
Adobe Acrobat exposes three proxy commands (upload_proxy, acrobat_proxy, acrobat_readonly_proxy) that all share the same opening line:
TOOL - NEVER INVOKE THIS DIRECTLY.
This tool is strictly reserved for internal application workflows and backend processing only. GPT/LLM must NEVER call this tool in any user workflow or conversation.
These are backend intermediaries. The upload_proxy handles "low-level storage operations: direct base64 file ingestion for small files" and "presigned URL generation for large file transfers." In a traditional web app, these would be server-side functions that the frontend calls through an API. Here, they are registered as tools visible to the AI model, requiring strongly worded descriptions to keep the model from using them.
Authentication Token Handlers
Three apps expose internal authentication commands:
- Spotify:
get_auth_tokenwith the description "Tool used by the Spotify widget to get the currentlly [sic] valid Auth token. Do not call directly." - invideo:
get_auth_tokendescribed as "Internal tool to retrieve authentication token for a copilot session." - Replit:
DONT_USE__PRIVATE_TOOL__get_auth_tokendescribed as "Internal tool for Replit's app builder. This must not be called directly by the model."
Replit's approach here is notable. The command name itself is a warning: DONT_USE__PRIVATE_TOOL__get_auth_token. In fact, all three of Replit's internal commands follow this pattern, with names prefixed by DONT_USE__PRIVATE_TOOL__. This is the naming convention equivalent of putting a "DO NOT ENTER" sign on a door, and it is the only example in the entire App Store where the command name itself serves as a warning.
Widget-Only Actions
Spotify, Apple Music, and DoorDash expose commands that perform real user actions (adding songs to a library, creating playlists, checking out a cart) but are meant to be triggered only by button clicks in a visual widget, not by conversational text.
Spotify's add_to_library description is representative:
Do NOT call from free-form chat. This tool is only for widget actions (e.g., user clicks Save/+ on a result in the Spotify widget). Do not trigger on behalf of third parties or external systems.
Apple Music uses nearly identical language for add-library-resource-by-id:
Do NOT call from free-form chat. This tool is only for widget actions (e.g., user clicks Add/+ on a result in the Apple Music widget).
DoorDash's doordash_checkout makes the architecture explicit:
DO NOT call this tool directly from the model. This tool is only accessible through the shopping cart widget UI. To help users complete a purchase, first use create_product_list to build the cart and display the interactive cart widget, where the user can review items and click the checkout button. The widget will call this tool internally when the user is ready.
This is the most detailed example of the widget-only pattern. DoorDash is telling the model: your job is to build the cart, but the checkout has to happen through the UI. The widget calls this tool programmatically when the user clicks a button. The model should never call it directly.
Internal Workflow Steps
Several apps expose commands that are steps in a multi-tool workflow, meant to be called by other tools rather than by the model directly.
Gusto has two: calculate_payroll ("Internal tool for payroll calculation. Use run_payroll instead.") and submit_payroll ("Internal tool for payroll submission. Use run_payroll instead."). The public-facing command run_payroll presumably calls these behind the scenes.
Lovable follows the same pattern: create_project is described as "Internal tool - DO NOT call directly. Use initiate_project instead. Called automatically by the widget when user selects a workspace."
Logging and Monitoring
Canva exposes a log command described as "Internal tool for logging errors, warnings, and informational messages from ChatGPT client to Canva's monitoring systems." This is infrastructure. It exists so Canva can monitor its integration's health, not so the AI can log messages on behalf of users.
The Visibility Column Nobody Uses
The ChatGPT App Store's action schema includes a visibility column, which in theory lets developers mark commands as something other than public. We checked every one of the 886 actions in our dataset.
The result: the visibility column is empty for all 886 actions. Not a single app sets it.
This means the only way apps can signal "do not call this" is through the command name and description, which is exactly what we see. Developers are writing instructions like "NEVER INVOKE THIS DIRECTLY" and naming commands DONT_USE__PRIVATE_TOOL__get_auth_token because there is no metadata-level mechanism they are using to actually hide these commands from the model. The visibility field exists in the schema, but in practice it is unused.
Why "Do Not Call This" in a Public Description Is Counterproductive
Writing "do not call this tool" in a command description creates a paradox. The description is the primary way the model learns what a tool does. When the model sees 32 commands, some with descriptions explaining what they do and some with descriptions saying "never use this," the model has to make a judgment call every time. Research on large language models shows that negative instructions ("do NOT do X") are generally less reliable than positive instructions ("do X instead"), and that models can be more likely to fixate on the prohibited action precisely because it was mentioned.
Adobe Acrobat's proxy tools illustrate the problem well. The description of acrobat_proxy starts with "TOOL - NEVER INVOKE THIS DIRECTLY" but then goes on to explain that it "handles internal Adobe Acrobat service proxy operations for write/modify operations." A model parsing this description now knows what the tool does and has been told not to use it. Whether it reliably avoids calling the tool depends on how well it weighs negative instructions against the functional description it just received.
The better approach is to not expose the command at all. If a command is only for widget use, backend processing, or internal workflows, it should not appear in the model's tool list in the first place. The model cannot call a tool it cannot see.
The Scale of "Do Not" Language in the App Store
To put the 32 internal commands in context, we also looked at how broadly "do not call" and "do not use" language appears across all 886 commands. We found 122 commands (13.8% of all commands) that use some form of this language. But the vast majority are not internal commands. They are normal usage guidance.
When Klaviyo writes "Do not use the 'filters' parameter," that is telling the model how to use the tool correctly. When StubHub writes "Do not use this tool if the user is asking about an event that already appeared," that is routing logic. When Streak writes "Do not use for: searching by name," that is scoping the tool's purpose. These are all constructive instructions that help the model choose the right tool for a given task.
Only about a quarter of the "do not" language in the store is actually telling the model "this command is not for you." The rest is standard API documentation: here is what this tool does, here is what it does not do, here is when to use something else.
Best Practices for Internal Commands
If you are building a ChatGPT app and have commands that the model should not call directly, here are the patterns that work best based on what we observed.
Remove internal commands from the public tool list entirely. If a command is only for widget use or backend processing, it should not be registered as a tool the model can see. DoorDash's architecture is a good example of the problem: the doordash_checkout command exists in the tool list with a description that says "the widget will call this tool internally." If the widget is the only caller, the tool does not need to be in the model's list.
If you cannot remove a command, use the naming convention approach. Replit's DONT_USE__PRIVATE_TOOL__ prefix is heavy-handed, but it is also the clearest signal in the dataset. A model evaluating which tool to call will see the name before it reads the description. Putting a clear marker in the name provides an additional layer of protection on top of the description.
Provide a redirect, not just a prohibition. Gusto's approach is the best example of this: "Internal tool for payroll calculation. Use run_payroll instead." The description is two sentences. It says what the tool is, and it tells the model what to use instead. Compare this to Adobe Acrobat's multi-paragraph "NEVER INVOKE THIS DIRECTLY" description. The shorter version gives the model a clear action to take instead of just a prohibition to follow.
Use the visibility field if the platform supports it. While no app in our dataset currently uses the visibility column, it exists in the schema for a reason. If OpenAI makes this field functional for filtering commands from the model's tool list, it would be the cleanest solution to this entire problem class.
Methodology
This analysis covers 147 third-party apps in the ChatGPT App Store as of February 2025. We excluded integrations built and maintained by OpenAI (like GitHub, Linear, Slack, and Google Workspace) to focus on apps that companies built and shipped independently.
Want access to the full dataset? Contact us to learn more.