The DITA Open Toolkit release 1.2 or above provides a HTML to DITA migration tool, which migrates
HTML files to DITA files. This migration tool originally comes from the developerWorks
publication of Robert D. Anderson's how-to articles with the original h2d code. This migration
tool is under plugins\h2d directory. You can use it separately because it
is not integrated into the main transformation of toolkit. The version in the toolkit is more
recent, but the articles should be referenced for information on details of the program, as
well as for information on how to extend it. There are links to the articles at the bottom of
this page.
Preconditions The
preconditions to be considered before using the migration
tool are listed below:
- The HTML file content must be divided among concepts,
tasks, and reference articles. If not, the HTML files
should be reworked before migrating.
- This migration tool is intended for topics. The HTML page
should contain a single section without any nested
sections.
- DITA architecture is focused on topics, information that is
written for books needs to be redesigned in order
to fit into a topic-based archiecture.
- This migration utility only works with valid XHTML files,
HTML files must be cleaned up using HTML Tidy or
other utility before processing.
Post conditions There
are also some post conditions to consider after processing:
- In some case, the tool cannot determine the correct way to migrate,
it places the contents in a <required-cleanup> element, you
should fix such elements in the output DITA files.
- Check the output DITA files. Compare them with the source
HTML files and check if both contents are equivalent.
Extension points The
HTML2DITA migration tool helps extension in the following
listed ways:
- The genidattridbute template can be
overridden to change the method for creating the topic
ID.
- The gentitlealts template can be
overridden to change the ways of title generation.
- Override respond section in the tool to preserve the
semantic of source, in case if the <div> or <span>
element is used in regular structures.
- You can also migrate to another specialized DTD by
overriding the original template base on the specific DTD
and your required output.