LaTeX file repair | .NET

How to check and repair a LaTeX file

If you have a text file that you think is a LaTeX file and you want it to be typeset, but you are not sure if it is really a LaTeX file (perhaps you are new to the LaTeX world), you may try using the checking and repairing LaTeX feature provided by the Aspose.TeX API for .NET. In the following example, we will check and repair a sample file, invalid-latex.tex, from the Aspose.TeX for .NET example project.

First of all, it is worth mentioning that although the sample file seems to use the TeX syntax, it does not have the structure required by LaTeX. As you may know, a LaTeX file must have a preamble that starts with the \documentclass command and a body within the document environment, i.e. between \begin{document} and \end{document}.

Let’s now look at the C# code sample.

 1// Create repair options.
 2LaTeXRepairerOptions options = new LaTeXRepairerOptions();
 3// Specify a file system working directory for the output.
 4options.OutputWorkingDirectory = new OutputFileSystemDirectory(RunExamples.OutputDirectory);
 5// Specify a file system working directory for the required input.
 6// The directory containing packages may be located anywhere.
 7options.RequiredInputDirectory = new InputFileSystemDirectory(Path.Combine(RunExamples.InputDirectory, "packages"));
 8// Specify the callback class to externally guess packages required for undefined commands or environments.
 9options.GuessPackageCallback = new PackageGuesser();
10// Run the repair process.
11new Features.LaTeXRepairer(Path.Combine(RunExamples.InputDirectory, "invalid-latex.tex"), options).Run();

As with a regular TeX job, we first create an object that contains the options of the process we are about to run. Most of them are the same as the options of a regular TeX job. Indeed, the InputWorkingDirectory is the space from which the input files should be read. We don’t use it here because we provide the full path to the main input file in the file system, and no custom files are supposed to be included in the main input file. Then, OutputWorkingDirectory is the space where the output files should be written. RequiredInputDirectory, if assigned, points to the space where you may store LaTeX packages that are not embedded in the Aspose.TeX library. The GuessPackageCallback property will be discussed later.

After assigning the options, we simply run the process!

So, what is the checking and repairing process like? Firstly, the API searches the input file for a \documentclass occurrence. If it fails, it assumes that \documentclass{article} must be inserted at the very beginning of the file. This fact is reflected in the repair report file (.log).

Then, it starts scanning the adjusted input file from the very beginning. The TeX engine charged with the LaTeX format may throw an error at some point, signaling that no \begin{document} has been found so far, although it should have already occurred. Thus, the position at which \begin{document} must be inserted is defined and reflected in the report.

As the engine scans the file further, it may find undefined commands or environments. The API can make assumptions on embedded required packages (those that define the commands and environments that are not defined until these packages are included) for some of the most common commands and environments. However, there is a way to make such assumptions externally by implementing the IGuessPackageCallback interface. An instance of such a class should be assigned to the GuessPackageCallback option.

Here is a very simple example that just maps the \head command to the fancyhdr package:

 1public class PackageGuesser : IGuessPackageCallback
 2{
 3    private Dictionary<string, string> _map = new Dictionary<string, string>();
 4
 5    public PackageGuesser()
 6    {
 7        _map.Add("lhead", "fancyhdr"); // Defines the mapping between the \lhead command and the fancyhdr package.
 8    }
 9
10    public string GuessPackage(string commandName, bool isEnvironment)
11    {
12        string packageName;
13        if (!isEnvironment)
14        {
15            _map.TryGetValue(commandName, out packageName);
16            return packageName ?? ""; // It's better to return an empty string to avoid consequent calls for the same command name.
17        }
18
19        // Some code for environments
20        // ...
21
22        return "";
23    }
24}

As for the sample file, the engine first encounters the \chapter command, which is not defined in the article document class but is defined in the book document class. The API adjusts the document class so that the final version of the fixed file will start with \documentclass{book}. Then, the engine finds the aforementioned \lhead command and decides that \usepackage{fancyhdr} must be inserted in the preamble. The \href and \includegraphics commands that occur later on make the Repairer insert \usepackage{hyperref} and \usepackage{graphics} in the preamble, respectively. These decisions are based on the API’s internal mappings. Again, all such fixes are logged in the report file.

Finally, the engine abnormally terminates since the normal ending of a LaTeX document is missing. This makes the Repairer append \end{document} to the end of the file and reflect this fact in the report.

Once the fixed version of the original file is built, the Repairer runs the TeX job on it for the final check. In our example, this run does not find any critical errors, so the fixed version may be typeset more or less as expected.

Here is the full report:

 1Trying to repair the original file...
 2--------------------------------------------------------------------------------
 3\documentclass is missing in the original file. Inserted at the beginning.
 4\begin{document} is missing in the original file. Inserted at line 3, pos. 0.
 5The command \chapter at line 3, pos. 0 is undefined. Consider using \usepackage{package_name} in the preamble,
 6    where 'package_name' is the name of the package which defines this command.
 7The command \lhead at line 5, pos. 0 is undefined. \usepackage{fancyhdr} is inserted in the preamble
 8    since the 'fancyhdr' package supposedly defines the command.
 9The command \href at line 8, pos. 0 is undefined. \usepackage{hyperref} is inserted in the preamble
10    since the 'hyperref' package supposedly defines the command.
11The command \href at line 17, pos. 0 is undefined. \usepackage{hyperref} is inserted in the preamble
12    since the 'hyperref' package supposedly defines the command.
13The command \href at line 20, pos. 0 is undefined. \usepackage{hyperref} is inserted in the preamble
14    since the 'hyperref' package supposedly defines the command.
15The command \href at line 27, pos. 0 is undefined. \usepackage{hyperref} is inserted in the preamble
16    since the 'hyperref' package supposedly defines the command.
17The command \href at line 32, pos. 0 is undefined. \usepackage{hyperref} is inserted in the preamble
18    since the 'hyperref' package supposedly defines the command.
19The command \includegraphics at line 54, pos. 2 is undefined. \usepackage{graphicx} is inserted in the preamble
20    since the 'graphicx' package supposedly defines the command.
21The command \href at line 67, pos. 0 is undefined. \usepackage{hyperref} is inserted in the preamble
22    since the 'hyperref' package supposedly defines the command.
23The command \href at line 95, pos. 57 is undefined. \usepackage{hyperref} is inserted in the preamble
24    since the 'hyperref' package supposedly defines the command.
25The command \href at line 96, pos. 0 is undefined. \usepackage{hyperref} is inserted in the preamble
26    since the 'hyperref' package supposedly defines the command.
27The command \href at line 98, pos. 100 is undefined. \usepackage{hyperref} is inserted in the preamble
28    since the 'hyperref' package supposedly defines the command.
29\end{document} is missing in the original file. Inserted at the end.
30
31Checking the repaired file...
32--------------------------------------------------------------------------------
33There are no critical errors in the fixed file.

You may also check out our free AI LaTeX Repairer web app, which is built based on the feature implemented within Aspose.TeX for .NET API and involves a more advanced implementation of the IGuessPackageCallback interface.

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.