Invoicing
Invoice OCR and parsing
How Financica extracts data from uploaded invoices using OCR.
When you upload a PDF or image of an invoice, Financica uses optical character recognition (OCR) to automatically extract the key information. This saves you from manually entering invoice details.
What gets extracted
The OCR engine identifies and extracts:
- Supplier or customer name — The company that issued or received the invoice.
- Invoice number — The reference number on the invoice.
- Invoice date — When the invoice was issued.
- Due date — When payment is expected.
- Line items — Individual products or services with descriptions, quantities, and prices.
- Subtotals and totals — Including any discounts applied.
- VAT details — Tax rates and VAT amounts per line item and in total.
- Payment information — Bank account details or payment references, when available.
How the process works
- Upload — You upload a PDF or image file from the expenses or revenue section.
- Processing — The file is sent to the OCR engine for analysis. This typically takes a few seconds.
- Review — The extracted data is presented for your review. Fields that the engine was less confident about may be highlighted.
- Correct and save — Make any necessary corrections and save the invoice record.
Tips for better OCR results
- Use high-quality scans — Clear, well-lit images produce better results than blurry photos.
- PDF is preferred — Native PDF files (not scanned images saved as PDF) give the best results because the text is already machine-readable.
- Standard layouts — Invoices with conventional layouts are parsed more accurately than highly stylized designs.
- One invoice per file — Upload each invoice as a separate file for the cleanest results.
Supported file formats
- PDF (native and scanned)
- PNG and JPG images
- HEIC photos (from iPhone cameras)
Hybrid PDFs with embedded invoice data
Some PDFs carry the invoice data twice: as the visual document and as a structured XML attachment inside the file. The Factur-X standard (also called ZUGFeRD in Germany) is the most common format. When you upload such a PDF, Financica reads the embedded XML directly and skips OCR entirely. The result is the same as a UBL invoice: every field is extracted with 100% accuracy.
You do not need to do anything to opt in -- detection is automatic. If the embedded XML cannot be read (older or non-compliant variants), the upload falls back to OCR.
When OCR is not enough
For invoices that OCR struggles with (handwritten, unusual layouts, or very poor quality), you can always enter the details manually. The OCR extraction is a starting point, not a requirement -- every field can be edited.
For structured electronic invoices (UBL XML or Factur-X / ZUGFeRD), no OCR is needed at all. See Electronic invoicing.