| HTML Tidy Options : Interface ( Functions : Constants ) : Examples : Structure : Support : Download : Copyright & License : History : Home | Version 0.3.0 | 
mxTidy provides a Python interface to a thread-safe, library version of the HTML Tidy command line tool.
HTML Tidy helps you to cleanup coding errors in HTML and XML files and produce well-formed HTML, XHTML or XML as output. This allows you to preprocess web-page for inclusion in XML repositories, prepare broken XML files for validation and also makes it possible to write converters from well-known word processing applications such as MS Word to other structured data representations by using XML as intermediate format.
During the development of this interface, the original HTML Tidy version was significantly modified to turn it from a single run, command line tool into a thread-safe C library which not only interfaces to files, but also to memory buffers.
Most of mxTidy's operations are automatic or can be manipulated by large number of configuration options. It also provides you with access to the error and warning information generated by HTML Tidy.
HTML Tidy is very good at trying to restructure the HTML or XML input, but unfortunately not too fast at it. The main reason for this is the single character input/output strategy used in the code which causes quite a few C function calls.
Changing the code to use a buffer and pointer strategy would enhance the performance, but requires a lot of work.
The memory requirements in string to string mode amount to about twice the size of the input string in addition to the parser tree overhead. In file to file mode, only the tree overhead is introduced.
Note that the current releases reconfigure HTML Tidy for every run which causes additional overhead.
	  The mxTidy package defines the following interfaces. Most
	  important are the HTML Tidy options which the interface
	  functions allow you to pass to the underlying HTML Tidy
	  engine.
	  
	
	 
	      Most of the original HTML Tidy options are also
	      available in the mxTidy interface; some options have
	      been removed, though, since they don't map well to an
	      embedded module, e.g. there is no configuration file
	      support and the slide bursting options have also been
	      removed.
	     
	      The following options are available. The default values
	      used in mxTidy are given in parenthesis. Note that some
	      options have different defaults than in the command line
	      version of HTML Tidy.
	     
	      For more information about the background and workings
	      of HTML Tidy, please see the HTML Tidy Overview which is
	      also included in the package.
	     
		  Note that if the input document includes an
		  <?xml?> declaration then it will appear in the
		  output independent of the value of this option.
		 
		  This is needed if the whitespace in such elements is to
		  be parsed appropriately without having access to the
		  DTD. The default is 0.
		 
		  The default is 0. This option is automatically
		  set if the input is in XML.
		 
		  Microsoft has developed its own optional filter for
		  exporting to HTML, and the 2.0 version is much
		  improved.  You can download the filter free from the
		  Microsoft Office Update site.
		 
	      These descriptions were extracted from the HTML Tidy documentation and
	      fall under the HTML Tidy copyright.
	 The package defines these functions:
	     
		   
		  If  
		  The same is true for error information which Tidy
		  generates.  This is either written to
		   
		   
		  Tidy options can be passed to the function using keyword parameters,
		  e.g.  
	      The package defines these constants:
	     
	  If you find any bugs, please report them to me so that
	  I can fix them for the next release.
	 
    HTML Tidy Options
	
	    
	
	
	      
	    
		    add_xml_decl (0)
		    add_xml_space (0)
		    assume_xml_procins (0)
		    break_before_br (0)
		    clean (0)
		    drop_empty_paras (1)br elements as HTML4 precludes
		empty paragraphs. The default is 1.
		
		    drop_font_tags (0)
		    enclose_block_text (0)
		    fix_backslash (1)
		    fix_bad_comments (1)
		    gnu_emacs (0)
		    hide_endtags (0)
		    indent_attributes (0)
		    input_xml (0)
		    literal_attributes (0)
		    logical_emphasis (0)
		    numeric_entities (0)
		    output_error (1)
		    output_markup (1)
		    output_xhtml (0)
		    output_xml (0)
		    quiet (0)
		    quote_ampersand (1)
		    quote_marks (0)
		    quote_nbsp (1)
		    raw (0)
		    show_warnings (0)
		    tidy_mark (0)
		    uppercase_attributes (0)
		    uppercase_tags (0)
		    word_2000 (0)
		    wrap_asp (1)
		    wrap_attributes (0)
		    wrap_jste (1)
		    wrap_php (1)
		    wrap_script_literals (0)
		    wrap_sections (1)
		    indent_spaces (2)
		    tab_size (8)
		    wrap (72)
		    alt_text (None)
		    indent ("no")
		    char_encoding ("ascii")Functions
	
	    
	
	
	      
	
		    tidy(input, output=None, errors=None, **options)(nerrors, nwarnings, outputdata, errordata).
		input may be a string or a file open for
		  reading data.
		output is given as file open for
		  writing, the generated markup is written to this file
		  and outputdata is set to None. Otherwise,
		  output is written to a string which is returned by the
		  function in outputdata.
		errors or returned via errordata.
		nerrors and nwarnings are
		  integers which are set to the number of errors/warnings
		  which TIDY generated.
		  
		output_xhtml=1. Configuration
		  files are not supported by the interface.
		Constants
	
	    
	
	
	      
	
		    Error
The package currently does not expose any submodules.
TBD
	This snippet demonstrates some of the possible interactions
	of mxTidy types and Python number types:
	>>> from mx.Tidy import *
>>> # To be written...
	
      
More examples will appear in the Examples subdirectory of the package.
[Tidy]
       Doc/
       [Examples]
       [mxTidy]
              libtidy/
              test.py
       Tidy.py
      
      Names with trailing / are plain directories, ones with []-brackets are Python packages, ones with ".py" extension are Python submodules.
The package imports all symbols from the extension module and also registers the types so that they become compatible to the pickle and copy mechanisms in Python.
    
    
     
	  eGenix.com is providing commercial support for this
	  package. If you are interested in receiving information
	  about this service please see the eGenix.com
	  Support Conditions.
     
       
	  © 2001, Copyright by eGenix.com Software, Skills and
	  Services GmbH, Langenfeld, Germany; All Rights Reserved.
	  mailto: info@egenix.com
	 
	  The mxTidy software and the modifications to the HTML Tidy
	  source code are covered by the eGenix.com Public License
	  Agreement. The text of the license is also included
	  as file "LICENSE" in the package's main directory.
	 
	  The included HTML Tidy software is covered by the following
	  license:
	 
	   
	   By downloading, copying, installing or otherwise using
	  the software, you agree to be bound by the terms and
	  conditions of the eGenix.com
	  Public License Agreement and the above HTML Tidy
	  license. 
    
    
     Things that still need to be done:
	 Things that changed from 0.2.0 to 0.3.0:
	 Things that changed from 0.1.0 to 0.2.0:
	 
	  Version 0.1.0 was the first public release.
	 
    Support
    
	
    What I'd like to hear from you...
    
      
    
    
	
    Copyright & License
    
	
      Copyright (c) 1998-2000 World Wide Web Consortium
      (Massachusetts Institute of Technology, Institut National de
      Recherche en Informatique et en Automatique, Keio University).
      All Rights Reserved.
      Contributing Author(s):
	 Dave Raggett, dsr@w3.org
      The contributing author(s) would like to thank all those who
      helped with testing, bug fixes, and patience.  This wouldn't
      have been possible without all of you.
      COPYRIGHT NOTICE:
      This software and documentation is provided "as is," and
      the copyright holders and contributing author(s) make no
      representations or warranties, express or implied, including
      but not limited to, warranties of merchantability or fitness
      for any particular purpose or that the use of the software or
      documentation will not infringe any third party patents,
      copyrights, trademarks or other rights. 
      The copyright holders and contributing author(s) will not be
      liable for any direct, indirect, special or consequential damages
      arising out of any use of the software or documentation, even if
      advised of the possibility of such damage.
      Permission is hereby granted to use, copy, modify, and distribute
      this source code, or portions hereof, documentation and executables,
      for any purpose, without fee, subject to the following restrictions:
      1. The origin of this source code must not be misrepresented.
      2. Altered versions must be plainly marked as such and must
	 not be misrepresented as being the original source.
      3. This Copyright notice may not be removed or altered from any
	 source or altered source distribution.
      The copyright holders and contributing author(s) specifically
      permit, without fee, and encourage the use of this source code
      as a component for supporting the Hypertext Markup Language in
      commercial products. If you use this source code in a product,
      acknowledgment is not required but would be appreciated.  
	  
	History & Future