mxTidy

mxTidy - HTML Tidy for Python

Cleanup your HTML files, convert even broken HTML into validating XHTML, prepare web scraping input for XML processing. All this using a single function and implemented in a thread-safe and scalable way.
Version: 3.0.0

Introduction

mxTidy provides a Python interface to a thread-safe, library version of the HTML Tidy. command line tool.

HTML Tidy helps you to cleanup coding errors in HTML and XML files and produce well-formed HTML, XHTML or XML as output. This allows you to preprocess web-page for inclusion in XML repositories, prepare broken XML files for validation and also makes it possible to write converters from well-known word processing applications such as MS Word to other structured data representations by using XML as intermediate format.

Features

  • Easy-to-use interface.
  • Error reports.
  • Thread-safe.
  • Works on files and strings.
  • Flexible formatting options.
  • No external dependencies.
  • Stable, robust and portable.
  • Free to use and redistribute.

System Requirements

mxTidy is written in a very portable way and works on pretty much all platforms where you can compile Python. There is no need to have HTML Tidy or the newer HTML Tidy Library installed on your system.

We provide precompiled versions for all standard platforms, so all you need is a working Python installation. The package supports all Python versions since Python 2.1.

Note that the version of HTML Tidy included in mxTidy is a modified and extended version of the original command line tool. We have taken the command line tool and turned it into a C library, making it thread-safe in the process. These changes were later used as basis for starting the HTML Tidy Library project.

If you want to compile the package from source, you'll just need an ANSI C compiler. No other libraries are needed.

License

mxTidy is provided as part of the eGenix.com mx Experimental Distribution. Please see the eGenix.com mx Experimental Distribution page for details regarding the license.

The mxTidy software and the modifications to the HTML Tidy source code are covered by the eGenix.com Public License Agreement, which is included in the distribution.The copyright for the changes to HTML Tidy remains with eGenix.com.

The original HTML Tidy software (which is partly included in the package) is covered by the following W3C license:

Copyright (c) 1998-2000 World Wide Web Consortium (Massachusetts Institute of Technology, Institut National de Recherche en Informatique et en Automatique, Keio University).
All Rights Reserved.

Contributing Author(s): Dave Raggett, dsr@w3.org

The contributing author(s) would like to thank all those who helped with testing, bug fixes, and patience. This wouldn't have been possible without all of you.

COPYRIGHT NOTICE:

This software and documentation is provided "as is," and the copyright holders and contributing author(s) make no representations or warranties, express or implied, including but not limited to, warranties of merchantability or fitness for any particular purpose or that the use of the software or documentation will not infringe any third party patents, copyrights, trademarks or other rights.

The copyright holders and contributing author(s) will not be liable for any direct, indirect, special or consequential damages
arising out of any use of the software or documentation, even if advised of the possibility of such damage.

Permission is hereby granted to use, copy, modify, and distribute this source code, or portions hereof, documentation and executables, for any purpose, without fee, subject to the following restrictions:
1. The origin of this source code must not be misrepresented.
2. Altered versions must be plainly marked as such and must not be misrepresented as being the original source.
3. This Copyright notice may not be removed or altered from any source or altered source distribution.

The copyright holders and contributing author(s) specifically permit, without fee, and encourage the use of this source code as a component for supporting the Hypertext Markup Language in commercial products. If you use this source code in a product, acknowledgment is not required but would be appreciated.

Documentation

The following documentation is available for mxTidy:

mxTidy User Manual and Reference Guide - HTML and PDF

The manual explains the available formatting options and includes a reference of the available programming interfaces.

The PDF file is also available as part of the installation and can be found in the mx/Tidy/Doc/ folder.

Download & Installation

mxTidy is provided as part of the eGenix.com mx Experimental Distribution. Please see the eGenix.com mx Experimental Distribution page for downloads and installation instructions.

References

mxTidy was originally written for the eGenix.com Application Server to filter HTML uploads and optionally reformat pages into XHTML.

mxTidy is also used in a number of Python applications: ZChecker (Zope), HTML Filter (Plone), Epoz (Zope), Haufe iDesk (commercial information desktop based on Zope).

History & Changes

Please see the change log for details regarding changes to the package between releases.