Marcus Raitner Lehrstuhl für Theoretische Informatik Universität Passau 94032 Passau raitner@fmi.uni-passau.de |
Michael Himsolt Lehrstuhl für Theoretische Informatik Universität Passau 94032 Passau himsolt@fmi.uni-passau.de |
GML_scanner
procedure
GML_token
structure
GML_value
enumeration (scanner related
entries)
GML_tok_val
structure
GML_parser
procedure
GML_stat
structure
GML_pair
structure
GML_value
enumeration (parser related
entries)
GML_pair_val
structure
GML_free_list
procedure
GML_print_list
procedure
gml_demo
programm
GML, the Graph Modelling Language, is a portable file format for graphs. GML has been developed as part of the Graphlet system, and has been implemented in several other systems, including LEDA, GraVis and VGJ.
This document describes a sample scanner and parser for GML.
Unlike other implementations, this one uses ANSI C and does not rely
on external tools such as lex
and yacc
.
This implementation is also designed to be highly portable and can be
used as a library.
The procedure GML_scanner
implements
the scanner for GML files:
struct GML_token GML_scanner (FILE*
file);
GML_scanner
reads the next input token from
file
and returns it in a GML_token
structure. file
must be open for read access; the caller
is responsible for opening anc closing the file.
The type GML_token
is defined as follows:
struct GML_token { GML_value kind; union tok_val value; };
Where kind
determines the type of the token and
value
is its value.
kind
is of type GML_value
, which is listed in Table
1.
GML_KEY |
token is the name of a key |
data in value.string |
GML_INT |
token is integer number |
data in value.integer |
GML_DOUBLE |
token is floating point number |
data in value.floating |
GML_STRING |
token is a string |
data in value.string |
GML_L_BRACKET |
token is left bracket |
no data in value |
GML_R_BARCKET |
token is right bracket |
no data in value |
GML_END |
EOF was reached (value undefined) |
no data in value |
GML_ERROR |
error occured while parsing |
additional information in value.error |
The value
field in GML_token
is of type
GML_tok_val
, which is defined as follows:
union GML_tok_val { long integer; // used with GML_INT double floating; // used with GML_DOUBLE char* string; // used with GML_STRING, GML_KEY struct GML_error err; // used with GML_ERROR };
The procedure GML_parser
implements the parser for GML files:
struct GML_pair* GML_parser (FILE* file, struct GML_stat* stat, int mode);
Input parameters for GML_parser
are the
file
, a pointer to a GML_stat
structure and
and the operations mode
. file
must be open
for reading; the caller is responsible for opening and closing the
file. stat
must point to a structure of type
GML_stat
, which is defined as follows:
The variablestruct GML_stat { struct GML_error err; struct GML_list_elem* key_list; };
err
is used to report errors (for information on
GML_error
see below)
If an error occurs during parsing,
stat->err.err_num
is set to the corresponding error
code, and additional information is written into the data
structure pointed to by stat->err
. If no error occurs, then
stat->err.err_num
has the value GML_OK
.
mode
is almost always 0
.
The other field in GML_stat
is key_list
, a
pointer to a singly linked list of the strings used for keys. You can access
the first key-string with key_list->key
and the next node with
key_list->next
.
The parameter mode
needs further clarification.
GML_parser
parses lists recursively. Therefore, A
closing square bracket (]) means the end of a list in a
recusive call and a syntax error (GML_TOO_MANY_BRACKETS
)
at the top level. Therefore, mode
is used to
discriminate between the top level (mode == 0
) and a
recursion step (mode == 1
). 0
at the top
level and and 1 in a recursion step.
GML_parser
returns a structure of type
GML_pair
, which is defined as follows:
struct GML_pair { char* key; GML_value kind; union pair_val value; struct GML_pair* next; };
Each object in a GML_pair
structure corresponds to a
key-value pair in the GML file, where key
is a pointer to
the key, and kind
and value
hold the value.
For example, the sequence "id 42" translates into a
GML_pair structure
where key
is "id",
kind
is GML_INT
and
value.integer
is 42. next
implements GML lists. next
is a pointer to the next
element within the current in the list, or NULL
if there
are no more elements in the
list.
The field kind
determines which of the fields in
value
is used. kind
is of type
GML_value
, which is listed in Table 2.
GML_INT |
value is a integer number |
data in value.integer |
GML_DOUBLE |
value is a floating point number |
data in value.floating |
GML_STRING |
value is a string |
data in value.string |
GML_LIST |
value is a list of key-value pairs |
data in value.list |
The data structure GML_pair_val
is defined as
follows:
union GML_pair_val { long integer; // kind is GML_INT double floating; // kind is GML_DOUBLE char* string; // kind is GML_STRING struct GML_pair* list; // kind is GML_LIST };
Note: string
contains no characters with
ASCII code greater than 127 because these are converted into the
iso8859-1-format. See the GML Manual for details.
The following auxiliary procedures are defined for
GML_pair
:
void GML_free_list (struct GML_pair* list, struct GML_list_elem* key_list)
Frees recursivly all storage allocated for list
and for
key_list
(which is decribed above).
void GML_print_list (struct GML_pair* list, int level)
Writes list
to stdout
, using level
for indentation. This meant for debugging only.
The currently read line and column are stored in the global
variables GML_line
and GML_column
. If you
are interested in where an error occured, you should call
before calling parser or scanner the first time. It will set both variables to 1.void GML_init ()
The procedures GML_scanner
and
GML_parser
read until they find an
error, or an end of file. If the parser encounters
an error, it returns the GML structure parsed so far and provides error
information in its error
parameter.
The structure GML_error
reports scanner and parser errors:
struct GML_error { GML_error_value err_num; int line; int column; };
line
is the input line in which the error
occured, and column
is the corresponding column.
Both line and column start at 1. err_num
is of type
GT_error_value
, which is listed in Table 3.
GML_UNEXPECTED |
unexpected charcter was found |
GML_SYNTAX |
Broken key-value structure |
GML_PREMATURE_EOF |
End of file encountered while reading a string |
GML_TOO_MANY_DIGITS |
A number has too many digits (that is, more than 1024 in the current implementation). |
GML_OPEN_BRACKET |
At least one bracket not closed at EOF |
GML_OK |
No errors occured |
The following example
reads a GML file (filename is specified in the command line) and writes the
parsed key-value pairs and the list of keys to standard
output
.
#include "gml_parser.h" #include <stdio.h> #include <stdlib.h> void print_keys (struct GML_list_elem* list) { while (list) { printf ("%s\n", list->key); list = list->next; } } void main (int argc, char* argv[]) { struct GML_pair* list; struct GML_stat* stat=(struct GML_stat*)malloc(sizeof(struct GML_stat)); stat->key_list = NULL; if (argc != 2) printf ("Usage: gml_demo <gml_file> \n"); else { FILE* file = fopen (argv[1], "r"); if (file == 0) printf ("\n No such file: %s", argv[1]); else { GML_init (); list = GML_parser (file, stat, 0); if (stat->err.err_num != GML_OK) { printf ("An error occured while reading line %d column %d of %s:\n", stat->err.line, stat->err.column, argv[1]); switch (stat->err.err_num) { case GML_UNEXPECTED: printf ("UNEXPECTED CHARACTER"); break; case GML_SYNTAX: printf ("SYNTAX ERROR"); break; case GML_PREMATURE_EOF: printf ("PREMATURE EOF IN STRING"); break; case GML_TOO_MANY_DIGITS: printf ("NUMBER WITH TOO MANY DIGITS"); break; case GML_OPEN_BRACKET: printf ("OPEN BRACKETS LEFT AT EOF"); break; case GML_TOO_MANY_BRACKETS: printf ("TOO MANY CLOSING BRACKETS"); break; default: break; } printf ("\n"); } GML_print_list (list, 0); printf ("Keys used in %s: \n", argv[1]); print_keys (stat->key_list); GML_free_list (list, stat->key_list); } } }