Here is the proposal I sent to Norm Megill about adding "metadata" statements to .mm files such as set.mm so that programs can more easily convert from .mm format to other formats such as Ghilbert or HCodebert. I recommend downloading the latest metamath.zip to see set.mm's newest statements. (Uppercase is used to make the metadata statements stand out, but an extract program should look for lower or upper.) Comments/feedback/etc.? Now is the time to provide input!
http://us2.metamath.org:8888/index.html#downloads
Also check out the "#*#*#*#" and "=-=-=" that existed previously. You will find that the "modules" (books) already have chapter/section breakdowns (I can see (someone) creating a new webpage showing a Table of Contents based on the module/section breakdown of set.mm which would not only pull out the theorem labels but the first sentence of the comment preceding each theorem. More than 10 years of work are embedded in set.mm and it is impressive, to say the least.)
[Note: conversion of set.mm to another system may be more challenging than first imagined. The issue of overloaded operators and conversions of kinds/grammatical type codes comes into play (see weq vs. wceq, for example.) Overloaded operators are dealt with in mmj2 which constructs a set of Grammar Rules from Metamath syntax axioms and deals with combinations of Nulls and Type Conversions at the same time.]
After due consideration of multiple factors, especially Occam's Rule, I conclude that there is no need to add anything to break-out set.mm according to Book/Chapter/Section, or to distinguish between the MODULE break-out and Book/Chapter/Section. (I think MODULE (see below) metadata comment lines will do just fine as Book markers/delimiters.) I see that the comments beginning with $( #*#*#*#*#*#*#* are the intended "modules" from the author's viewpoint, and that comments beginning with $( =-=-=-=-=-=-=- designate "sections" within those modules. Assuming that you continue to maintain consistency, an extract program should have no problem automatically breaking out a file into modules and sections, which can then be automatically numbered -- and captions can be pulled from the module/section header comments. So what I did was create the minimal set of metadata to add to set.mm, in the form of Metamath comments that begin with "$( <MM>". In the future, when we see "$( <MM>" we can *know* that the comment is metadata for post-processors, such as the Ghilbert extract, or perhaps an html page builder program. Here is how the MODULE metadata looks: $( <MM> <MODULE> <ID>LOGICB</ID> <PREREQ> </PREREQ> </MODULE> </MM> $) <MODULE> -- Identifies as "MODULE" Metadata, which contains attributes "ID" and "PREREQ" (multiple prerequisites are possible). Ideally, Module ID should be 7 characters in length, or less, and should be composed of letters; digits are permitted but not as the first character (the goal is to support 8 character file names on all platforms and to reserve the 8th character position to distringuish between axiom and theorem files -- the Ghilbert extract will append an "A" or a "T" to the module name...) As you can see from the automated file comparison below there are these modules: __ID__ _PREREQ_ __comments________________________________________ PREFACE (contains only comments, so null Ghilbert extract) LOGICB LOGICP LOGICB ZFC LOGICP NUMBERS ZFC TYPESET HILBERT (contains only comments, so null Ghilbert extract) HILBERT ZFC The convention should be that the "$( <MM>" metadata header comment identifies the start of a module, and that the module continues until the start of the next module or the end of the file. AND... if the "$( <MM>" comment is immediately followed by a $( #*#*#*#*#*#*#* comment, then the module's Caption will be pulled from the "#*#*#*" comment; otherwise the module caption will be blank -- note that I changed the Typesetting module's Caption comment from $( =-=-=-=-=-=-=- to $( #*#*#*#*#*#*#* because TYPESET is logically a separate module, not a section within NUMBERS. The way the PREREQ data will work is that when building a Ghilbert extract from set.mm (or whichever .mm file), the PREREQ will be used to import the prerequisite interface file(s). In Ghilbert, prerequisites are cumulative, so if ZFC's prequisites are LOGICB and LOGICP, and LOGICP's prerequisite is LOGICB, then only LOGICP needs to be specified as a prerequisite for ZFC. Also, just as a preliminary idea unsupported by experimentation, each MODULE's Ghilbert extract will result in two ".gh" files, one for axioms and one for theorems/proofs. NOTE: the file comparison below is hard to read, so in English, what I did was precede every "#*#*#*" header comment in set.mm with a "$( <MM>" module metadata comment (even at the start of set.mm!!!) And, I changed the "=-=-=-" characters in the Typesetting section of set.mm to "#*#*#*", thus giving Typesetting module-level status. (And I changed the file date to 9/14.) NOTE2: I wasn't sure whether HILBERT's prerequisite was ZFC or NUMBERS, but I picked ZFC (guessing.) NOTE3: I have no investment in the MODULE ID values, so adjust however seems correct to you. It will be (probably) a few months before I have coded the Ghilbert extract enhancement to mmj2 anyway... Thanks! ====================================================== P.S. "1a1" below means line 1 of new file inserted before line 1 of old file. "6c7" means line 7 of new file replaces line 6 of old file. Etc. ======================================================= Compare: (<)\temp\metamath\set.mm (3819672 bytes) with: (>)\temp\metamath\setMODULES.mm (3820189 bytes) 1a1 > $( <MM> <MODULE> <ID>PREFACE</ID> <PREREQ> </PREREQ> </MODULE> </MM> $) 6c7 < set.mm - Version of 13-Sep-2005 --- > set.mm - Version of 14-Sep-2005 865a866 > $( <MM> <MODULE> <ID>LOGICB</ID> <PREREQ> </PREREQ> </MODULE> </MM> $) 7357a7359 > $( <MM> <MODULE> <ID>LOGICP</ID> <PREREQ>LOGICB</PREREQ> </MODULE> </MM> $) 11252c11255 < --- > $( <MM> <MODULE> <ID>ZFC</ID> <PREREQ>LOGICP</PREREQ> </MODULE> </MM> $) 48260a48263 > $( <MM> <MODULE> <ID>NUMBERS</ID> <PREREQ>ZFC</PREREQ> </MODULE> </MM> $) 67163,67166c67167,67171 < $( < =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= < Appendix: Typesetting definitions for the tokens in this file < =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= --- > $( <MM> <MODULE> <ID>TYPESET</ID> <PREREQ>HILBERT</PREREQ> </MODULE> </MM> $) > $( > #*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*# > Appendix: Typesetting definitions for the tokens in this file > #*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*# 68105a68110 > $( <MM> <MODULE> <ID>HILBERT</ID> <PREREQ>ZFC</PREREQ> </MODULE> </MM> $)