``. ``with_attribute`` can be called with: - keyword arguments, as in ``(class="Customer", align="right")``, or - a list of name-value tuples, as in ``(("ns1:class", "Customer"), ("ns2:align", "right"))`` An attribute can be specified to have the special value ``with_attribute.ANY_VALUE``, which will match any value - use this to ensure that an attribute is present but any attribute value is acceptable. - ``match_only_at_col(column_number)`` - a parse action that verifies that an expression was matched at a particular column, raising a ``ParseException`` if matching at a different column number; useful when parsing tabular data - ``common.convert_to_integer()`` - converts all matched tokens to int - ``common.convert_to_float()`` - converts all matched tokens to float - ``common.convert_to_date()`` - converts matched token to a datetime.date - ``common.convert_to_datetime()`` - converts matched token to a datetime.datetime - ``common.strip_html_tags()`` - removes HTML tags from matched token - ``common.downcase_tokens()`` - converts all matched tokens to lowercase - ``common.upcase_tokens()`` - converts all matched tokens to uppercase Common string and token constants --------------------------------- - ``alphas`` - same as ``string.letters`` - ``nums`` - same as ``string.digits`` - ``alphanums`` - a string containing ``alphas + nums`` - ``alphas8bit`` - a string containing alphabetic 8-bit characters:: ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ .. _identchars: - ``identchars`` - a string containing characters that are valid as initial identifier characters:: ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzª µºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ - ``identbodychars`` - a string containing characters that are valid as identifier body characters (those following a valid leading identifier character as given in identchars_):: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzª µ·ºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ - ``printables`` - same as ``string.printable``, minus the space (``' '``) character - ``empty`` - a global ``Empty()``; will always match - ``sgl_quoted_string`` - a string of characters enclosed in 's; may include whitespace, but not newlines - ``dbl_quoted_string`` - a string of characters enclosed in "s; may include whitespace, but not newlines - ``quoted_string`` - ``sgl_quoted_string | dbl_quoted_string`` - ``python_quoted_string`` - ``quoted_string | multiline quoted string`` - ``c_style_comment`` - a comment block delimited by ``'/*'`` and ``'*/'`` sequences; can span multiple lines, but does not support nesting of comments - ``html_comment`` - a comment block delimited by ``''`` sequences; can span multiple lines, but does not support nesting of comments - ``comma_separated_list`` - similar to DelimitedList_, except that the list expressions can be any text value, or a quoted string; quoted strings can safely include commas without incorrectly breaking the string into two tokens - ``rest_of_line`` - all remaining printable characters up to but not including the next newline - ``common.integer`` - an integer with no leading sign; parsed token is converted to int - ``common.hex_integer`` - a hexadecimal integer; parsed token is converted to int - ``common.signed_integer`` - an integer with optional leading sign; parsed token is converted to int - ``common.fraction`` - signed_integer '/' signed_integer; parsed tokens are converted to float - ``common.mixed_integer`` - signed_integer '-' fraction; parsed tokens are converted to float - ``common.real`` - real number; parsed tokens are converted to float - ``common.sci_real`` - real number with optional scientific notation; parsed tokens are convert to float - ``common.number`` - any numeric expression; parsed tokens are returned as converted by the matched expression - ``common.fnumber`` - any numeric expression; parsed tokens are converted to float - ``common.identifier`` - a programming identifier (follows Python's syntax convention of leading alpha or "_", followed by 0 or more alpha, num, or "_") - ``common.ipv4_address`` - IPv4 address - ``common.ipv6_address`` - IPv6 address - ``common.mac_address`` - MAC address (with ":", "-", or "." delimiters) - ``common.iso8601_date`` - date in ``YYYY-MM-DD`` format - ``common.iso8601_datetime`` - datetime in ``YYYY-MM-DDThh:mm:ss.s(Z|+-00:00)`` format; trailing seconds, milliseconds, and timezone optional; accepts separating ``'T'`` or ``' '`` - ``common.url`` - matches URL strings and returns a ParseResults with named fields like those returned by ``urllib.parse.urlparse()`` Unicode character sets for international parsing ------------------------------------------------ Pyparsing includes the ``unicode`` namespace that contains definitions for ``alphas``, ``nums``, ``alphanums``, ``identchars``, ``identbodychars``, and ``printables`` for character ranges besides 7- or 8-bit ASCII. You can access them using code like the following:: import pyparsing as pp ppu = pp.unicode greek_word = pp.Word(ppu.Greek.alphas) greek_word[...].parse_string("Καλημέρα κόσμε") The following language ranges are defined. ========================== ================= ================================================ Unicode set Alternate names Description -------------------------- ----------------- ------------------------------------------------ Arabic العربية Chinese 中文 CJK Union of Chinese, Japanese, and Korean sets Cyrillic кириллица Devanagari देवनागरी Greek Ελληνικά Hangul Korean, 한국어 Hebrew עִברִית Japanese 日本語 Union of Kanji, Katakana, and Hiragana sets Japanese.Hiragana ひらがな Japanese.Kanji 漢字 Japanese.Katakana カタカナ Latin1 All Unicode characters up to code point 255 LatinA LatinB Thai ไทย BasicMultilingualPlane BMP All Unicode characters up to code point 65535 ========================== ================= ================================================ The base ``unicode`` class also includes definitions based on all Unicode code points up to ``sys.maxunicode``. This set will include emojis, wingdings, and many other specialized and typographical variant characters. Generating Railroad Diagrams ============================ Grammars are conventionally represented in what are called "railroad diagrams", which allow you to visually follow the sequence of tokens in a grammar along lines which are a bit like train tracks. You might want to generate a railroad diagram for your grammar in order to better understand it yourself, or maybe to communicate it to others. Usage ----- To generate a railroad diagram in pyparsing, you first have to install pyparsing with the ``diagrams`` extra. To do this, just run ``pip install pyparsing[diagrams]``, and make sure you add ``pyparsing[diagrams]`` to any ``setup.py`` or ``requirements.txt`` that specifies pyparsing as a dependency. Create your parser as you normally would. Then call ``create_diagram()``, passing the name of an output HTML file.:: street_address = Word(nums).set_name("house_number") + Word(alphas)[1, ...].set_name("street_name") street_address.set_name("street_address") street_address.create_diagram("street_address_diagram.html") This will result in the railroad diagram being written to ``street_address_diagram.html``. `create_diagrams` takes the following arguments: - ``output_html`` (str or file-like object) - output target for generated diagram HTML - ``vertical`` (int) - threshold for formatting multiple alternatives vertically instead of horizontally (default=3) - ``show_results_names`` - bool flag whether diagram should show annotations for defined results names - ``show_groups`` - bool flag whether groups should be highlighted with an unlabeled surrounding box - ``embed`` - bool flag whether generated HTML should omit , , and tags to embed the resulting HTML in an enclosing HTML source (such as PyScript HTML) - ``head`` - str containing additional HTML to insert into the section of the generated code; can be used to insert custom CSS styling - ``body`` - str containing additional HTML to insert at the beginning of the section of the generated code Example ------- You can view an example railroad diagram generated from `a pyparsing grammar for SQL SELECT statements <_static/sql_railroad.html>`_ (generated from `examples/select_parser.py <../examples/select_parser.py>`_). Naming tip ---------- Parser elements that are separately named will be broken out as their own sub-diagrams. As a short-cut alternative to going through and adding ``.set_name()`` calls on all your sub-expressions, you can use ``autoname_elements()`` after defining your complete grammar. For example:: a = pp.Literal("a") b = pp.Literal("b").set_name("bbb") pp.autoname_elements() `a` will get named "a", while `b` will keep its name "bbb". Customization ------------- You can customize the resulting diagram in a few ways. To do so, run ``pyparsing.diagrams.to_railroad`` to convert your grammar into a form understood by the `railroad-diagrams `_ module, and then ``pyparsing.diagrams.railroad_to_html`` to convert that into an HTML document. For example:: from pyparsing.diagram import to_railroad, railroad_to_html with open('output.html', 'w') as fp: railroad = to_railroad(my_grammar) fp.write(railroad_to_html(railroad)) This will result in the railroad diagram being written to ``output.html`` You can then pass in additional keyword arguments to ``pyparsing.diagrams.to_railroad``, which will be passed into the ``Diagram()`` constructor of the underlying library, `as explained here `_. In addition, you can edit global options in the underlying library, by editing constants:: from pyparsing.diagram import to_railroad, railroad_to_html import railroad railroad.DIAGRAM_CLASS = "my-custom-class" my_railroad = to_railroad(my_grammar) These options `are documented here `_. Finally, you can edit the HTML produced by ``pyparsing.diagrams.railroad_to_html`` by passing in certain keyword arguments that will be used in the HTML template. Currently, these are: - ``head``: A string containing HTML to use in the ```` tag. This might be a stylesheet or other metadata - ``body``: A string containing HTML to use in the ```` tag, above the actual diagram. This might consist of a heading, description, or JavaScript. If you want to provide a custom stylesheet using the ``head`` keyword, you can make use of the following CSS classes: - ``railroad-group``: A group containing everything relating to a given element group (ie something with a heading) - ``railroad-heading``: The title for each group - ``railroad-svg``: A div containing only the diagram SVG for each group - ``railroad-description``: A div containing the group description (unused)