Migrating HTML to DITA The HTML to DITA migration tool ships in the plugins directory of the toolkit, and does not make use of the common toolkit processing for DITA content.

The DITA Open Toolkit release 1.2 or above provides a HTML to DITA migration tool, which migrates HTML files to DITA files. This migration tool originally comes from the developerWorks publication of Robert D. Anderson's how-to articles with the original h2d code. This migration tool is under plugins\h2d directory. You can use it separately because it is not integrated into the main transformation of toolkit. The version in the toolkit is more recent, but the articles should be referenced for information on details of the program, as well as for information on how to extend it. There are links to the articles at the bottom of this page.

Preconditions

The preconditions to be considered before using the migration tool are listed below:

  • The HTML file content must be divided among concepts, tasks, and reference articles. If not, the HTML files should be reworked before migrating.
  • This migration tool is intended for topics. The HTML page should contain a single section without any nested sections.
  • DITA architecture is focused on topics, information that is written for books needs to be redesigned in order to fit into a topic-based archiecture.
  • This migration utility only works with valid XHTML files, HTML files must be cleaned up using HTML Tidy or other utility before processing.
Post conditions

There are also some post conditions to consider after processing:

  • In some case, the tool cannot determine the correct way to migrate, it places the contents in a <required-cleanup> element, you should fix such elements in the output DITA files.
  • Check the output DITA files. Compare them with the source HTML files and check if both contents are equivalent.
Known limitations
Extension points

The HTML2DITA migration tool helps extension in the following listed ways:

  • The genidattridbute template can be overridden to change the method for creating the topic ID.
  • The gentitlealts template can be overridden to change the ways of title generation.
  • Override respond section in the tool to preserve the semantic of source, in case if the <div> or <span> element is used in regular structures.
  • You can also migrate to another specialized DTD by overriding the original template base on the specific DTD and your required output.
Migrating HTML to DITA, part 1 Migrating HTML to DITA, part 2