prebetafinal / epitypes
Epistemological file type classification for content management systems
Requires
- php: >=7.4
README
Semantic file classification for content management systems.
Problem 🤖
File systems organize by what files are (.jpg, .pdf, .mp3). Humans organize by what files do (images to view, documents to read, data to process). Epitypes bridges this gap with semantic classification.
Solution
Epitypes (epistemological types) bridge this gap using a three-level hierarchy:
Nature - fundamental behavior
- Folders - native file system directories
- Pages - files you edit (markdown, code, data, config)
- Assets - files you consume (media, documents, archives, materials)
- Ignore - system files to skip (.DS_Store, .git, etc.)
Type - semantic grouping (raw, markup, media, documents)
Format - specific implementation (html, image, pdf)
Hierarchy
Epitypes
├── 📁 folders (native directories)
├── 📄 pages (editable)
│ ├── raw (txt, log)
│ ├── markup (html, md)
│ ├── code (js, py)
│ ├── data (json, yaml)
│ └── config (ini, env)
├── 🗃️ assets (consumable)
│ ├── media (jpg, mp3, mp4)
│ ├── documents (pdf, docx)
│ ├── archives (zip, tar)
│ └── materials (psd, 3ds)
└── 🚫 ignored (system files)
Structure (abbreviated for clarity)
{
"pages": {
"description": "Editable content files that can be opened in text editors",
"types": {
"raw": ["txt", "text", "log"],
"markup": ["html", "md", "xml"],
"code": ["js", "py", "php"],
"data": ["json", "yaml", "csv"],
"config": ["ini", "env", "conf"]
}
},
"assets": {
"description": "Files for consumption, reference, or use as materials",
"types": {
"media": ["jpg", "mp3", "mp4"],
"documents": ["pdf", "docx"],
"archives": ["zip", "tar", "7z"],
"materials": ["psd", "3ds", "midi"]
}
},
"ignored": {
"description": "Items to ignore during file scanning",
"items": [".git", ".DS_Store", "Thumbs.db", ".svn", ".hg", ".epitome", ".gitignore", "node_modules", ".vscode", ".idea"]
}
}
Epistemic Foundation 🤓
Layer 1 (Nature/Ontological): pages vs assets
- Pure human conceptual distinction: "What I work on" vs "What I consume"
- Domain-Driven Design level - reflects user mental models
- Never changes because it's fundamental to human cognition
Layer 2 (Type/Categorical): raw, markup, code, data, config
- Human-computer bridge layer - how humans categorize information processing
- Reflects both human logic (raw text vs structured) and computational needs
- Stable because these are fundamental information categories
Layer 3 (Format/Technical): markdown, javascript, json, etc.
- Pure technical implementation details
- Machine-readable yet human-understandable
- Most volatile layer - formats come and go
Each level and node tends to be linguistically sound:
- Raw = unstructured human thought
- Markup = structured human expression
- Code = human instructions for machines
- Data = structured information
- Config = system parameters
This creates an epistemic hierarchy: pure human cognition → hybrid human-machine categories → technical specifications.
Why Not MIME Types?
MIME types solve technical delivery (image/jpeg), not human organization:
- ❌
application/jsondoesn't tell you if it's data, config, or a unit shelter - ❌
text/plaincould be a note, log, or code snippet - ❌ No concept of "ignorable" vs "important" files
Epitypes adds semantic layers that MIME types lack.
Notes 📜
The 'page' term works perfectly as long as it's presented as a page in the UI, regardless of its deeper nature, and this is something widely accepted in the domain. An asset seems natural without any explanation as the opposite of page in given context.
This classification is not final so any input/feedback is welcome.
Unknown Files
Files not listed in epitypes automatically fall into an "other" group. Things like executables (.exe), system binaries, and other irrelevant to content management formats. These cases should be handled by algorithms accordingly.
License
MIT