Migrate from Confluence XHTML to Asciidoctor
You can convert Atlassian Confluence XHTML pages to Asciidoctor using this Groovy script.
The script calls Pandoc to convert single or multiple HTML files exported from Confluence to AsciiDoc files. You’ll need Pandoc installed before running this script. If you have trouble running this script, you can use the Pandoc command referenced inside the script to convert XHTML files to AsciiDoc manually.
// This script is provided by melix.
// The source can be found at https://gist.github.com/melix/6020336
@Grab('net.sourceforge.htmlcleaner:htmlcleaner:2.4')
import org.htmlcleaner.*
def src = new File('html').toPath()
def dst = new File('asciidoc').toPath()
def cleaner = new HtmlCleaner()
def props = cleaner.properties
props.translateSpecialEntities = false
def serializer = new SimpleHtmlSerializer(props)
src.toFile().eachFileRecurse { f ->
def relative = src.relativize(f.toPath())
def target = dst.resolve(relative)
if (f.isDirectory()) {
target.toFile().mkdir()
} else if (f.name.endsWith('.html')) {
def tmpHtml = File.createTempFile('clean', 'html')
println "Converting $relative"
def result = cleaner.clean(f)
result.traverse({ tagNode, htmlNode ->
tagNode?.attributes?.remove 'class'
if ('td' == tagNode?.name || 'th'==tagNode?.name) {
tagNode.name='td'
String txt = tagNode.text
tagNode.removeAllChildren()
tagNode.insertChild(0, new ContentNode(txt))
}
true
} as TagNodeVisitor)
serializer.writeToFile(
result, tmpHtml.absolutePath, "utf-8"
)
"pandoc -f html-native_divs -t asciidoctor $tmpHtml --wrap=none -o ${target}.adoc".execute().waitFor()
tmpHtml.delete()
}/* else {
"cp html/$relative $target".execute()
}*/
}
This script was created by Cédric Champeau (melix). You can find the source of this script hosted at this gist.
The script is designed to be run locally on HTML files or directories containing HTML files exported from Confluence.
Usage
-
Save the script contents to a
convert.groovy
file in a working directory. -
Make the file executable according to your specific OS requirements.
-
Create an
html
directory for input files and anasciidoc
directory for output files, both inside the working directory. -
Place individual files, or a directory containing files, into the aforementioned
html
directory. -
Run
groovy convert
to convert the files contained inside thehtml
directory. -
Look for the generated output file in the
asciidoc
directory and confirm it meets your requirements.