Extract Data from AcroForm using Python
Extract form fields from PDF document
Form from the aspose.pdf.facades namespace provides a straightforward way to read AcroForm field data without opening the full document object model. Iterate over form.field_names to get every field name present in the form, then call form.get_field(name) to retrieve its current value.
- Construct a
Formobject by passing the input file path. - Iterate over
form.field_namesto enumerate all field names. - Call
form.get_field(name)for each name and store the result in a dictionary.
import aspose.pdf as apdf
from io import FileIO
from os import path
import json
from aspose.pycore import cast, is_assignable
path_infile = self.dataDir + infile
form = apdf.facades.Form(path_infile)
form_values = {}
# Get values from all fields
for formField in form.field_names:
# Analyze names and values if needed
form_values[formField] = form.get_field(formField)
print(form_values)
Retrieve form field value by title
When you know the exact field name (title) defined in the PDF form, you can retrieve its value directly with form.get_field(name) without iterating over the entire field collection. This is the fastest approach when only specific fields are needed.
- Construct a Form object with the input file path.
- Call
form.get_field("FieldName")using the exact field title as it appears in the PDF. - Use the returned string value as needed in your application.
import aspose.pdf as apdf
form = apdf.facades.Form(path_infile)
# Retrieve a single field value by its name
value = form.get_field("FirstName")
print(value)
Extract form fields from PDF document to JSON
There are two ways to export AcroForm data to JSON. The first uses the built-in export_json method on Form, which serializes all field data directly to a file stream in a single call.
- Construct a
Formobject with the input file path. - Open the output file as a binary stream using
FileIO. - Call
form.export_json(stream, True)to write the JSON output.
import aspose.pdf as apdf
from io import FileIO
from os import path
path_infile = path.join(self.dataDir, infile)
path_outfile = path.join(self.dataDir, outfile)
form = apdf.facades.Form(path_infile)
with FileIO(path_outfile, "w") as json_file:
form.export_json(json_file, True)
The second approach builds a Python dictionary from field_names and get_field, then serializes it with json.dumps. Use this when you need to transform or filter the data before writing it.
- Iterate over
form.field_namesand populate a dictionary with field values. - Serialize the dictionary with
json.dumps(form_data, indent=4). - Write the resulting JSON string to the output file.
import aspose.pdf as apdf
from os import path
import json
path_infile = path.join(self.dataDir, infile)
path_outfile = path.join(self.dataDir, outfile)
form = apdf.facades.Form(path_infile)
form_data = {}
# Get values from all fields
for formField in form.field_names:
form_data[formField] = form.get_field(formField)
# Serialize to JSON
json_string = json.dumps(form_data, indent=4)
print(json_string)
with open(path_outfile, "w", encoding="utf-8") as json_file:
json_file.write(json_string)
Extract Data to XML from a PDF File
XML export is useful for integrating PDF form data with systems that consume structured XML feeds or schemas. The Form class provides export_xml to handle the conversion in one step.
- Create a
Forminstance and bind the PDF withform.bind_pdf(path). - Open the output file as a binary stream.
- Call
form.export_xml(stream)to write all field data as XML.
import aspose.pdf as apdf
from io import FileIO
from os import path
path_infile = path.join(self.dataDir, infile)
path_outfile = path.join(self.dataDir, outfile)
# Create Form object
form = apdf.facades.Form()
# Bind PDF document
form.bind_pdf(path_infile)
# Export data to XML file
with FileIO(path_outfile, "w") as f:
form.export_xml(f)
Export Data to FDF from a PDF File
FDF (Forms Data Format) is the standard interchange format for AcroForm data and is widely supported by PDF viewers and processing tools. Use export_fdf on the Form class to produce a standalone FDF file that can be imported back into the original PDF or another compatible form.
- Create a
Forminstance and bind the source PDF withform.bind_pdf(path). - Open the output file as a binary stream.
- Call
form.export_fdf(stream)to write the FDF data.
import aspose.pdf as apdf
from io import FileIO
from os import path
path_infile = path.join(self.dataDir, infile)
path_outfile = path.join(self.dataDir, outfile)
# Create Form object
form = apdf.facades.Form()
# Bind PDF document
form.bind_pdf(path_infile)
# Export form data to an FDF file
with FileIO(path_outfile, "w") as f:
form.export_fdf(f)
Export Data to XFDF from a PDF File
XFDF (XML Forms Data Format) is the XML-based successor to FDF and is better suited for use in web services and modern data pipelines. Like FDF, an XFDF file can be imported back into a compatible PDF form. Use export_xfdf on the Form class to generate the output.
- Create a
Forminstance and bind the source PDF withform.bind_pdf(path). - Open the output file as a binary stream.
- Call
form.export_xfdf(stream)to write the XFDF data.
import aspose.pdf as apdf
from io import FileIO
from os import path
path_infile = path.join(self.dataDir, infile)
path_outfile = path.join(self.dataDir, outfile)
# Create Form object
form = apdf.facades.Form()
# Bind PDF document
form.bind_pdf(path_infile)
# Export form data to an XFDF file
with FileIO(path_outfile, "w") as f:
form.export_xfdf(f)