LaTeX file repair | Java

How to check and repair a LaTeX file

If you are unsure whether a text file is a LaTeX file and want it to be typeset, you can utilize the checking and repairing LaTeX feature offered by the Aspose.TeX API for Java. In the example below, we will check and repair a sample file, invalid-latex.tex, from the Aspose.TeX for Java example project.

To begin with, it is important to note that the sample file seems to employ the TeX syntax but lacks the structure required by LaTeX. As you may know, a LaTeX file must include a preamble that begins with the \documentclass command, as well as a body enclosed within the document environment, specifically between \begin{document} and \end{document}.

Let’s now look at the Java code sample.

 1// Create repair options.
 2LaTeXRepairerOptions options = new LaTeXRepairerOptions();
 3// Specify a file system working directory for the output.
 4options.setOutputWorkingDirectory(new OutputFileSystemDirectory(Utils.getOutputDirectory()));
 5// Specify a file system working directory for the required input.
 6// The directory containing packages may be located anywhere.
 7options.setRequiredInputDirectory(new InputFileSystemDirectory(Utils.getInputDirectory() + "packages"));
 8// Specify the callback class to externally guess packages required for undefined commands or environments.
 9options.setGuessPackageCallback(new PackageGuesser());
10        
11// Run the repair process.
12new LaTeXRepairer(Utils.getInputDirectory() + "invalid-latex.tex", options).run();

Similar to a regular TeX job, we start by creating an object that holds the options for the process we are about to run. Most of these options are the same as those for a regular TeX job. Indeed, calling the setInputWorkingDirectory() method, we specify the space from which the input files should be read. However, in this example, we don’t use it here because we provide the full path to the main input file in the file system, and no custom files are to be included in the main input file. Next, we call the setOutputWorkingDirectory() method to specify the space where the output files should be written. If called, the setRequiredInputDirectory() method makes options point to the space where you may store LaTeX packages that are not embedded in the Aspose.TeX library. Finally, the setGuessPackageCallback() method will be discussed later on.

Once the options have been assigned, we can proceed with running the process.

So, what is the checking and repairing process like? It begins with the API searching for a \documentclass occurrence in the input file. If it is not found, the API assumes that \documentclass{article} should be inserted at the start of the file. This information is recorded in the repair report file (.log).

Next, the process begins scanning the adjusted input file from the beginning. The TeX engine equipped with the LaTeX format may, at some point, throw an error indicating that \begin{document} has not yet been found when it should have already. This way, the position at which \begin{document} must be inserted is defined and reflected in the report.

While further scanning the file, the engine may come across undefined commands or environments. In such cases, the API can make assumptions on embedded required packages (those that define the commands and environments that are not defined until these packages are included) for some of the most common commands and environments. However, it is possible to make such assumptions externally by a class that implements the IGuessPackageCallback interface. Once you have developed such a class, an instance of it should be passed to the process by calling the setGuessPackageCallback() method.

Here is a basic example that simply maps the \head command to the fancyhdr package:

 1// The callback class to externally guess packages required for undefined commands or environments.
 2public static class PackageGuesser implements IGuessPackageCallback
 3{
 4    private Map<String, String> _map = new HashMap<String, String>();
 5
 6    public PackageGuesser()
 7    {
 8        _map.put("lhead", "fancyhdr"); // Defines the mapping between the \lhead command and the fancyhdr package.
 9    }
10
11    public String guessPackage(String commandName, boolean isEnvironment)
12    {
13        if (!isEnvironment)
14        {
15            String packageName = _map.get(commandName);
16            return packageName != null ? packageName : ""; // It's better to return an empty string to avoid consequent calls for the same command name.
17        }
18
19        // Some code for environments
20        // ...
21
22        return "";
23    }
24}

As for the sample file, the engine first encounters the \chapter command, which is not defined in the article document class but is defined in the book document class. The API adjusts the document class so that the final version of the repaired file will begin with \documentclass{book}. Then, the engine finds the aforementioned \lhead command and determines that \usepackage{fancyhdr} must be inserted in the preamble. The \href and \includegraphics commands that occur later in the file make the Repairer insert \usepackage{hyperref} and \usepackage{graphics} in the preamble, respectively. These decisions are made based on the API’s internal mappings. As before, all these repairs are logged in the report file.

Finally, the engine abnormally terminates because the proper conclusion of a LaTeX document is missing. As a result, the Repairer appends \end{document} to the end of the file and includes this information in the report.

After creating the fixed version of the original file, the Repairer runs a TeX job on it for the final verification. In our example, this run does not detect any critical errors, so the fixed version can be typeset more or less as expected.

Here is the full report:

 1Trying to repair the original file...
 2--------------------------------------------------------------------------------
 3\documentclass is missing in the original file. Inserted at the beginning.
 4\begin{document} is missing in the original file. Inserted at line 3, pos. 0.
 5The command \chapter at line 3, pos. 0 is undefined. Consider using \usepackage{package_name} in the preamble,
 6    where 'package_name' is the name of the package which defines this command.
 7The command \lhead at line 5, pos. 0 is undefined. \usepackage{fancyhdr} is inserted in the preamble
 8    since the 'fancyhdr' package supposedly defines the command.
 9The command \href at line 8, pos. 0 is undefined. \usepackage{hyperref} is inserted in the preamble
10    since the 'hyperref' package supposedly defines the command.
11The command \href at line 17, pos. 0 is undefined. \usepackage{hyperref} is inserted in the preamble
12    since the 'hyperref' package supposedly defines the command.
13The command \href at line 20, pos. 0 is undefined. \usepackage{hyperref} is inserted in the preamble
14    since the 'hyperref' package supposedly defines the command.
15The command \href at line 27, pos. 0 is undefined. \usepackage{hyperref} is inserted in the preamble
16    since the 'hyperref' package supposedly defines the command.
17The command \href at line 32, pos. 0 is undefined. \usepackage{hyperref} is inserted in the preamble
18    since the 'hyperref' package supposedly defines the command.
19The command \includegraphics at line 54, pos. 2 is undefined. \usepackage{graphicx} is inserted in the preamble
20    since the 'graphicx' package supposedly defines the command.
21The command \href at line 67, pos. 0 is undefined. \usepackage{hyperref} is inserted in the preamble
22    since the 'hyperref' package supposedly defines the command.
23The command \href at line 95, pos. 57 is undefined. \usepackage{hyperref} is inserted in the preamble
24    since the 'hyperref' package supposedly defines the command.
25The command \href at line 96, pos. 0 is undefined. \usepackage{hyperref} is inserted in the preamble
26    since the 'hyperref' package supposedly defines the command.
27The command \href at line 98, pos. 100 is undefined. \usepackage{hyperref} is inserted in the preamble
28    since the 'hyperref' package supposedly defines the command.
29\end{document} is missing in the original file. Inserted at the end.
30
31Checking the repaired file...
32--------------------------------------------------------------------------------
33There are no critical errors in the fixed file.

You may also check out our free AI LaTeX Repairer web app, which is built based on the feature implemented within Aspose.TeX for .NET API and involves a more advanced implementation of the IGuessPackageCallback interface.

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.