A complete mapping of file extensions to their associated programming languages, markup formats, and configuration file types. Whether you are building a code editor, a file manager, or a syntax highlighting engine, this dataset gives you the extension-to-language lookup you need.
Pro tip: Download as JSON and use it as a reverse lookup — given any file extension, instantly resolve the language name and type category for syntax highlighting or icon selection.
Select which columns to include in your download.
About the Programming Language File Extensions Dataset
This dataset maps file extensions to their associated programming languages, scripting environments, markup formats, and configuration file types. Each entry includes the file extension (with leading dot), the full language or format name, a description of what the extension is used for, and a type category. The list covers mainstream languages like Python, JavaScript, and Java, as well as niche and domain-specific languages, template engines, and data serialization formats. It serves as a definitive lookup table for any tool that needs to identify a file's purpose from its name.
Common Use Cases
Developers and toolmakers rely on extension-to-language mappings across many contexts:
- Syntax highlighting: Power code editors, online IDEs, or documentation renderers by resolving the correct grammar or TextMate scope from a file's extension.
- File managers and explorers: Display language-specific icons, color coding, or contextual actions based on the detected file type.
- CI/CD pipelines: Automatically select the correct linter, formatter, or build tool by matching file extensions to language configurations in your pipeline definition.
- Repository analytics: Classify files across a codebase to generate language breakdown statistics, similar to the language bar shown on GitHub repository pages.
Extension Categories Explained
File extensions in this dataset are organized into several type categories. Compiled languages include extensions for languages like C, C++, Rust, and Go that are compiled to machine code or bytecode before execution. Interpreted languages cover Python, Ruby, PHP, and similar languages that are executed by a runtime interpreter. Markup and templating includes HTML, XML, Markdown, Jinja, and other formats used for document structure and content rendering. Stylesheet covers CSS, SCSS, LESS, and other styling languages. Data and configuration encompasses JSON, YAML, TOML, INI, and environment files used to store structured data and application settings. Shell and scripting includes Bash, PowerShell, Batch, and other command-line scripting formats.
How to Use in Your Application
Download the JSON format to build a fast in-memory lookup map keyed by extension. A single object property access gives you the language name and category for any file. For database-backed applications, the SQL export creates a normalized table you can JOIN against file metadata tables. The CSV format is useful for auditing purposes, bulk editing in a spreadsheet, or importing into data analysis tools. Use the column customizer to strip out fields you do not need and rename the remaining columns to match your application's naming conventions.
Handling Ambiguous Extensions
Some file extensions map to more than one language. For example, .h files can belong to C, C++, or Objective-C projects, and .m can indicate either MATLAB or Objective-C. This dataset lists the most commonly associated language for each extension and notes alternative associations in the description field. When building detection tools, consider using additional heuristics such as file content inspection, sibling file analysis, or project configuration files to disambiguate these cases. The description field provides guidance on which context clues to look for when the extension alone is not sufficient.