Pages

Showing posts with label Text-Based DSL (Domain-Specific Language). Show all posts
Showing posts with label Text-Based DSL (Domain-Specific Language). Show all posts

Wednesday, June 04, 2025

Andrej Karpathy: Vibe Coding

 

Let’s dive into Andrej Karpathy’s X post from June 4, 2025, and break it down in detail. Karpathy, a prominent figure in AI and machine learning (formerly at Tesla and OpenAI, now with xAI), is commenting on the future of software products in the context of human-AI collaboration. His post highlights a critical shift in how software interfaces and architectures need to evolve to remain relevant as AI, particularly large language models (LLMs), becomes integral to workflows. We will explain the post, elaborate on its implications, and provide detailed context for the concepts he’s addressing.


The Core Argument
Karpathy’s central thesis is that software products with complex, opaque user interfaces (UIs) and data formats that aren’t easily accessible or manipulable by AI are at risk of becoming obsolete ("ngmi" = "not gonna make it") in an era where human-AI collaboration is becoming the norm. He argues that for software to thrive, it must:
  1. Expose its underlying representations and settings in a way that AI can understand and manipulate (e.g., through text-based formats or scripting).
  2. Enable seamless integration with AI systems so that LLMs can act as co-pilots for professionals and empower "prosumers" (aspiring professionals or hobbyists) to engage in "vibe coding"—a more intuitive, less rigid form of programming.
He then categorizes software products into risk levels based on how AI-friendly their architectures are and warns that products failing to adapt will struggle as AI capabilities advance.

Breaking Down the Post
1. The Problem with Traditional Software UIs
Karpathy starts by critiquing software with "extensive/rich UIs" that rely heavily on graphical elements like sliders, switches, and menus, but lack scripting support and use "opaque, custom, binary formats." Examples of such software include Adobe products (like Photoshop), digital audio workstations (DAWs), and CAD/3D modeling tools.
  • What are "opaque, custom, binary formats"?
    • Binary formats are non-human-readable data representations (e.g., a .psd file in Photoshop or a proprietary CAD file). Unlike text-based formats (e.g., JSON, XML, or Python scripts), binary formats are encoded in a way that’s efficient for machines to process but difficult for humans or AI to interpret without specialized tools.
    • For example, a Photoshop .psd file contains layers, effects, and settings, but you can’t easily open it in a text editor to see or modify its structure. This makes it hard for an AI to understand or manipulate the file programmatically.
  • Why is this a problem for human-AI collaboration?
    • LLMs, like the ones Karpathy is referring to, excel at processing and generating text. They can understand and manipulate code, configurations, or data if it’s in a text-based format (e.g., a script that defines a 3D model in Blender). However, they struggle with binary formats because they can’t "read" them directly.
    • In a collaborative workflow, an AI co-pilot needs to understand the state of a project (e.g., the settings of a Photoshop filter or the parameters of a 3D model) and modify it based on user prompts (e.g., "increase the brightness by 10%"). If the software’s data is locked in a binary format, the AI can’t access or manipulate it without cumbersome workarounds like screen scraping or UI automation.
2. The Role of Scripting and Text-Based Representations
Karpathy emphasizes that software needs to be scriptable and use text-based domain-specific languages (DSLs) to be AI-friendly.
  • What is a DSL?
    • A domain-specific language is a specialized programming language tailored to a particular application domain. For example, SQL is a DSL for querying databases, and HTML is a DSL for defining web page structures. In the context of Karpathy’s post, a text-based DSL for a 3D modeling tool might describe a model’s geometry, materials, and transformations in a human- and AI-readable format.
    • The Wikipedia search result on DSLs mentions Xtext, a framework for developing DSLs, which highlights their importance in software engineering. DSLs allow problems to be expressed more clearly within a specific domain, which is exactly what Karpathy is advocating for.
  • Why does scripting matter?
    • Scripting allows users (and AI) to automate tasks and manipulate software programmatically. For example, in Blender, you can write Python scripts to create or modify 3D models. This is a text-based interface that an LLM can understand and use.
    • If a product like Photoshop had a robust scripting interface (e.g., a text-based way to define layers, filters, and adjustments), an AI could help a user by generating scripts to automate repetitive tasks or suggest edits. Without scripting, the AI is limited to interacting with the UI, which is slower and less reliable.
3. Human-AI Collaboration: Professionals and Prosumers
Karpathy highlights two groups that benefit from AI-friendly software:
  • Existing Professionals:
    • Professionals (e.g., graphic designers, 3D artists, or engineers) often use complex software with steep learning curves. An AI co-pilot that can understand the software’s state and settings could assist by automating tasks, suggesting optimizations, or even debugging issues.
    • For example, in a DAW like Ableton Live, an AI could analyze a project file (if it’s in a text-readable format) and suggest adjustments to the mix based on the user’s style.
  • Prosumers and "Vibe Coding":
    • Prosumers are amateur or aspiring professionals who want to create but lack the expertise of full professionals. Karpathy’s term "vibe coding" refers to a more intuitive, exploratory approach to creation, where users can describe what they want in natural language (e.g., "make a 3D model of a car with a sleek design") and the AI generates the necessary code or settings.
    • For vibe coding to work, the software must allow the AI to manipulate its features programmatically. If the software is locked behind a binary format or a purely graphical UI, the AI can’t help the prosumer achieve their vision.
4. Risk Spectrum of Software Products
Karpathy categorizes software into four risk levels based on how AI-friendly they are:
  • High Risk (Binary Objects, No Text DSL):
    • Examples: Adobe products (Photoshop, Premiere), DAWs (Ableton Live, FL Studio), CAD/3D tools (SolidWorks, AutoCAD).
    • These products rely heavily on binary formats and lack comprehensive scripting support. For example, Adobe Photoshop has some scripting capabilities (via JavaScript or Actions), but its core file format (.psd) is binary, making it hard for AI to manipulate directly.
    • Karpathy predicts these products are at the highest risk of becoming obsolete because they can’t integrate seamlessly with AI workflows.
  • Medium-High Risk (Partially Text Scriptable):
    • Examples: Blender, Unity.
    • Blender and Unity are more AI-friendly because they support scripting (Blender via Python, Unity via C#). For instance, the Stack Overflow result shows how to instantiate a Blender model in Unity, highlighting that these tools can be manipulated programmatically to some extent.
    • However, they still have limitations. Not all features in Blender or Unity are fully scriptable, and some data (e.g., Unity’s scene hierarchy) may still be opaque to AI without additional parsing.
  • Medium-Low Risk (Mostly Text, Some Automation):
    • Examples: Excel.
    • Excel is largely text-based (spreadsheets are essentially tables of data), and it supports automation via VBA (Visual Basic for Applications) and a plugin ecosystem. An AI can easily read an Excel file (e.g., as a CSV) and manipulate its contents.
    • However, Excel still has some graphical elements (e.g., its charting tools) that aren’t fully scriptable, which is why it’s not "low risk."
  • Low Risk (All Text, AI-Friendly):
    • Examples: IDEs (VS Code), Figma, Jupyter, Obsidian.
    • These tools are already text-native. For example, VS Code uses JSON for its settings and extensions, Figma’s files can be manipulated via its API (as noted in the UX Pilot search result), and Jupyter notebooks are essentially JSON files with code and markdown.
    • These products are "lucky" because their architecture aligns naturally with how LLMs operate, making them ideal for human-AI collaboration.
5. The Future of AI and UI/UX
Karpathy acknowledges that AI will improve at interacting with traditional UIs (e.g., through tools like "Operator," likely referring to AI systems that can navigate graphical interfaces by interpreting screenshots and simulating clicks). However, he argues that relying solely on this approach is risky.
  • Why not just wait for AI to get better at UIs?
    • While AI can improve at UI navigation (e.g., by using computer vision to "see" a screen and click buttons), this approach is brittle and inefficient compared to direct programmatic access. For example, if Photoshop’s UI changes in an update, an AI that relies on clicking buttons might break, whereas an AI that uses a scripting API would be unaffected.
    • Karpathy suggests that software companies need to "meet the technology halfway" by making their products more AI-friendly now, rather than waiting for AI to solve the problem entirely through UI automation.

Elaborating on the Implications
For Software Companies
Karpathy’s post is a wake-up call for software companies, especially those with legacy products. Companies like Adobe, which dominate creative industries, face a significant challenge:
  • Adapting Legacy Systems:
    • The Adobe news result mentions Adobe’s efforts to integrate AI into its Experience Cloud (e.g., AI-powered content creation with Adobe Firefly). However, Karpathy’s critique suggests that superficial AI features (like generating images) aren’t enough. Adobe needs to fundamentally rethink how its products expose data and functionality to AI.
    • For example, Adobe could develop a text-based DSL for Photoshop that allows users (and AI) to define layers, filters, and adjustments as code. This would make Photoshop more scriptable and AI-friendly.
  • Competitive Pressure:
    • Karpathy’s reply to Godwin Osama ("Figma to buy Adobe 2035?") is a provocative suggestion that text-native, AI-friendly tools like Figma could eventually overtake legacy giants like Adobe. Figma’s API and collaborative, web-based nature make it more adaptable to AI workflows, as noted in the UX Pilot search result.
For Developers and Designers
For professionals and prosumers, Karpathy’s vision of human-AI collaboration is empowering but requires a shift in mindset:
  • Learning to Work with AI:
    • Designers and developers will need to become comfortable with scripting and text-based workflows to fully leverage AI co-pilots. For example, a 3D artist using Blender might need to learn Python to automate tasks with AI assistance.
    • The Bezi reply (
      @bezi_ai
      ) highlights an early example of this: they’re using LLMs to accelerate Unity development, showing how AI can already assist with game design if the underlying tools are scriptable.
  • Vibe Coding for Prosumers:
    • Karpathy’s concept of "vibe coding" is particularly exciting for prosumers. Imagine a beginner using a DAW and saying, "Make my track sound more like a jazz piece." If the DAW’s project file is text-readable, an AI could adjust the MIDI notes, effects, and mixing parameters to achieve that vibe. Without scripting support, the AI would be limited to giving vague advice like, "Try adding a saxophone sample."
For AI Development
Karpathy’s post also has implications for the AI community:
  • Improving AI’s UI Interaction:
    • While Karpathy is skeptical of relying solely on UI automation, he acknowledges that AI will get better at it. The "near" reply (
      @nearcyan
      ) mentions that models are already good at guiding users through complex UIs (e.g., telling them where to click in Blender). Future AI systems, like a hypothetical "Claude Code for the middle of this stack," could bridge the gap between graphical UIs and programmatic control.
  • Standardizing AI-Friendly Formats:
    • The Wikipedia result on DSLs mentions frameworks like Xtext and Racket, which are designed to create new languages. The AI community could push for standardized, AI-friendly DSLs for creative tools, making it easier for software to become scriptable.
For the Broader Tech Ecosystem
Karpathy’s post ties into broader trends in technology:
  • The Semantic Web, Revisited:
    • Paul Calcraft’s reply (
      @paul_cal
      ) draws a parallel to the semantic web, noting that AI has effectively achieved its goals (structured, machine-readable data) without the rigid syntax of the 2000s. This suggests that AI-friendly software aligns with a long-standing vision of making data more accessible to machines.
  • The End of Traditional Applications?
    • The reply from
      @instance_11
      questions the need for traditional applications altogether. If AI can directly manipulate raw data (e.g., RGB values of an image), why use a tool like Photoshop at all? This radical perspective aligns with Karpathy’s vision but takes it further, suggesting that AI could eventually replace many software tools with bespoke, on-the-fly solutions.

Connection to Other Posts in the Thread
  • Lech Mazur’s Earlier Post (Thread 1, 2023):
    • Mazur’s 2023 post advocates for natural language interfaces in complex software like Photoshop. This aligns with Karpathy’s vision: a text-based interface (whether natural language or a DSL) would make software more AI-friendly. Mazur’s reply to Karpathy suggests that adding such interfaces is now more feasible, though it raises concerns about security (e.g., exposing APIs could introduce vulnerabilities).
  • Replies Highlighting Adoption:
    • Lee Robinson (
      @leerob
      ) agrees that many SaaS UIs could be replaced by AI agents with text inputs, reinforcing Karpathy’s point that graphical UIs are often overcomplicated for tasks that AI can handle more efficiently.
    • Bezi (
      @bezi_ai
      ) and clementmiao (
      @clementmiao
      ) provide examples of how some tools (Unity, Phaser.js) are already moving toward AI-friendly workflows, though challenges remain (e.g., Godot’s reference IDs aren’t AI-friendly).

Detailed Example: Adobe vs. Figma
Let’s compare Adobe Photoshop (high risk) and Figma (low risk) to illustrate Karpathy’s point:
  • Photoshop:
    • File Format: .psd is binary, making it hard for AI to read or modify directly.
    • Scripting: Photoshop supports some scripting via JavaScript, but it’s not comprehensive. Many features are only accessible through the UI.
    • AI Integration: Adobe is adding AI features (e.g., Firefly for generative content), but these are surface-level. An AI can’t deeply understand or manipulate a .psd file’s structure.
    • Risk: High. Photoshop’s architecture makes it difficult for AI to act as a co-pilot.
  • Figma:
    • File Format: Figma’s data is accessible via its API, which uses JSON (a text-based format). This makes it easy for AI to read and manipulate designs.
    • Scripting: Figma supports plugins and has a REST API, allowing programmatic control.
    • AI Integration: As the UX Pilot result shows, Figma is already integrating with AI tools to generate designs, wireframes, and sitemaps.
    • Risk: Low. Figma’s text-native, API-driven approach makes it ideal for human-AI collaboration.

Conclusion
Karpathy’s post is a forward-looking critique of software design in the age of AI. He argues that software must evolve to be AI-friendly by adopting text-based representations, scripting support, and DSLs, or risk becoming obsolete as human-AI collaboration becomes the norm. His risk spectrum provides a clear framework for understanding which products are well-positioned (e.g., Figma, VS Code) and which are at risk (e.g., Adobe products, DAWs). The broader implications touch on software development, AI research, and the future of creative workflows, urging companies to adapt now rather than wait for AI to solve the problem through UI automation.
For professionals and prosumers, this shift promises a future where AI co-pilots can enhance productivity and creativity, but it requires software to meet AI halfway. For companies, it’s a call to action: rethink your architecture, or risk being left behind in the AI-driven future.

Here’s a list of 10 key words and phrases from the detailed explanation of Andrej Karpathy’s X post, capturing the core concepts and themes discussed:
  1. Human-AI Collaboration
    • Central to Karpathy’s argument, this phrase refers to the growing partnership between humans and AI systems, particularly LLMs, in creative and professional workflows.
  2. Opaque Binary Formats
    • Describes non-human-readable data formats (e.g., Adobe’s .psd files) that hinder AI’s ability to understand and manipulate software data.
  3. Text-Based DSL (Domain-Specific Language)
    • A specialized, human- and AI-readable language for a specific domain, advocated as a solution for making software more scriptable and AI-friendly.
  4. Scripting Support
    • The ability to programmatically control software through code (e.g., Python in Blender), which Karpathy sees as essential for AI integration.
  5. Vibe Coding
    • Karpathy’s term for an intuitive, exploratory coding approach enabled by AI, where users describe goals in natural language, and AI generates the necessary code.
  6. Prosumers
    • Aspiring professionals or hobbyists who can benefit from AI assistance in creative tasks, provided software is AI-friendly.
  7. Risk Spectrum
    • Karpathy’s categorization of software into high, medium-high, medium-low, and low risk based on their AI compatibility (e.g., Adobe at high risk, Figma at low risk).
  8. AI Co-Pilot
    • Refers to AI systems that assist professionals by automating tasks, suggesting optimizations, or manipulating software settings programmatically.
  9. UI Automation
    • The less efficient approach of AI interacting with graphical UIs (e.g., by simulating clicks), which Karpathy warns against relying on exclusively.
  10. Text-Native Software
    • Software that uses text-based formats (e.g., JSON in Figma) and is inherently AI-friendly, positioning it as low risk in Karpathy’s framework.
These terms encapsulate the main ideas, challenges, and proposed solutions in the context of evolving software design for the AI era.