Costless · Data methodology

How Costless verifies retail price & receipt data

Costless verifies retail data by combining four independent sources: automated parsers across 56 supermarket chain websites, ML-based OCR of 240,000+ real purchase receipts, GPS-verified price labels from field agents, and OCR of promo banners from flyers, web images and official Telegram and Viber channels. Cross-source verification catches scraping errors, missed promotions and roughly 8% of fraudulent receipt submissions.

4independent data sources

56monitored chains

~22savg receipt processing

What we collect

Four independent data streams

Costless ingests four streams of independent retail data. Cross-checking between them is what makes our claims verifiable rather than asserted.

🔎

Online price & deal data

Automated parsers run against 56 supermarket chain websites. Each parser is hand-written per chain — we track regular prices, promotional prices with validity windows, loyalty-card-conditional prices, and multi-buy promotions.

🧾

Real purchase receipts

Users photograph their receipts after checkout. These are real transactions — proof a price was actually charged, on a specific date, at a specific store. Scraped catalog prices can be stale or misformatted for days; a receipt cannot.

🏷️

Field-collected price labels

Field agents photograph shelf-edge price labels in physical stores. This stream covers in-store promotions that never appear on a chain's website — a blind spot for online-only price-monitoring tools.

📢

Promo banners & channels

Many chains publish weekly promotions as PDF flyers, web banners, or posts on their official Telegram and Viber channels. We run vision-model OCR over each banner to extract products, prices and validity dates.

How we collect

From source to structured data

Online parsers

Each chain parser knows that chain's page structure or API. Parsers run on a fixed twice-weekly cycle, staggered to avoid burst load, respect site protections, and prefer chain-published APIs over scraping. Some chains we don't crawl at all because their terms prohibit it — for those, data comes from receipts, field collection and banners only.

Receipt OCR pipeline

An uploaded receipt is pre-processed with OpenCV (skew, contrast, glare, thermal-paper fade), then text is extracted by Google Cloud Vertex AI vision services as the primary engine, with Tesseract as a fast fallback and an in-house CNN trained on Eastern-European receipt formats for the hard cases. Line items are parsed, products matched against our normalized database via vector embeddings, and totals reconciled against the printed total.

End-to-end accuracy across accepted receipt formats is consistently high. Per-stage benchmarks vary by retailer template, paper type and image quality, so we don't publish a single number that would misrepresent the real distribution.

Fraud detection

Roughly 8% of submitted receipts are flagged as fraudulent before reaching the verified dataset. Signals include duplicate detection, image forensics (photos of screens or printouts, edited images), merchant-consistency checks, structural anomalies, and velocity heuristics. Flagged submissions are quarantined and the user is asked to re-photograph clearly.

Verified field capture

The field-agent app enforces an on-site flow: it reads GPS, reverse-geocodes to an address, and blocks capture unless the agent is at the target store. After an explicit check-in, photos are taken in-app only (never the camera roll), processed on-device by our AI models, and uploaded instantly — so every price ties to a specific store, agent and moment.

How we verify

Cross-source verification

Any single source can be wrong: a parser can miss a redesigned deal page, a receipt can carry mis-photography, a field label can be an outdated shelf tag. So we expect a price to appear in at least two streams before treating it as confirmed.

Receipt vs online — the receipt wins, because it is the price actually charged.
Deal page vs product page — the deal page wins, because loyalty pricing often isn't shown on product pages.
Field label vs online — field wins for the day it was captured; online wins longer-term because it refreshes more often.

For interoperability we follow industry standards: EAN-13 / EAN-8 / GTIN-14 (GS1) for barcodes, MCC for merchant classification, ISO 4217 for currencies, ISO 3166-1 for countries, and ISO 8601 for timestamps.

How fresh is the data

Refresh cadence by data type

Data type	Refresh cadence
Purchase receipts	~22 seconds on average from upload (end-to-end, including OCR)
Online prices & deal pages	Twice weekly per chain, on a fixed schedule
Promo banners	Per chain publication cycle (weekly for most)
Field-collected labels	Per agent session, on demand
Currency FX rates	Daily, from apilayer.com
Normalized product database	Continuous, as receipts and parser cycles land

When a parser breaks — a site redesign, a new block, an API change — affected deals show a visible "last refreshed" date rather than silently serving stale data.

Coverage & limits

Where we have data — and where we don't

Being honest about limits is part of being a trustworthy data source. Here is what we cover today and what we don't yet.

Consumer markets

Ukraine, Canada, Lithuania, Poland and the United Kingdom — daily deal coverage of major grocery chains, browsable in the deal explorer.

B2B API markets

Our Receipt OCR & verification API serves business partners in the same markets. Fiscal receipt verification against the state registry is currently live in Ukraine.

We don't publish a per-country list of monitored chains: coverage shifts as retailers redesign sites or change terms, and arrangements vary by chain. Shoppers see live coverage in the deal explorer; B2B customers get the full list under NDA.

What we don't yet cover

Pure online-only retailers — partial; our focus is physical-store retail.
Restaurant menus and HoReCa pricing — out of scope.
Bulk and wholesale pricing — out of scope.
Sparse-data categories — alcohol and pharmacy have regulated visibility per country, and some long-tail categories lack consistent coverage.

Privacy & compliance

How we handle your data

Your receipts stay yours

A receipt can carry personal details — a loyalty number, the last digits of a card, a fiscal number. We don't mask them, because an uploaded receipt is visible only to the person who uploaded it. It is never shown to other users, never embedded in a public listing, and never shared with B2B customers. Only the structured products and prices, in anonymized aggregate, power the price-verification side of the platform.

Data retention

Receipt images and extracted line items are kept for as long as you keep your account. You can delete your account anytime from your profile, which removes your receipt images, your extracted items and your personal information. The full policy lives in our Privacy Policy.

We never sell your data

Costless does not sell user data to any third party, in any jurisdiction, ever. This is a categorical policy. Individual receipts are never shared externally and individual identities are never associated with any public-facing data product.

Children's privacy

Costless is not directed to children under 16. We do not knowingly collect personal information from children under 16; if we learn that we have, we delete it.

Compliance

We operate under GDPR (EU/EEA), UK GDPR and the Data Protection Act 2018, and equivalent national laws including Canada's PIPEDA. Because we don't sell personal data, the CCPA "Do Not Sell" right has nothing to opt out of in our processing. Data Controller: [email protected].

Security review

We run an automated AI-based security review every month — covering authentication, input validation, rate limits, dependency scanning and header hardening. The platform has not been through a formal third-party audit (SOC 2, ISO 27001) at this time.

FAQ

📋 FAQ

Why is receipt-verified data better than scraped data?+

A scraped price says a retailer listed a price. A receipt-verified price proves a transaction happened at that price. Listings can be stale, misformatted, or show the wrong currency for days; a receipt confirms what was actually charged on a specific date.

How does Costless detect fraudulent receipts?+

Through duplicate detection, image forensics, merchant-consistency checks, structural anomaly detection and velocity heuristics — together they flag roughly 8% of submissions before they enter the verified dataset.

How fresh is Costless's price data?+

Receipts are processed in about 22 seconds on average from upload, end-to-end including OCR. Online prices and deal pages refresh on a fixed twice-weekly per-chain cycle. Promo banners refresh per chain publication cycle. FX rates refresh daily.

What countries does Costless cover?+

Consumer coverage is live in Ukraine, Canada, Lithuania, Poland and the United Kingdom. Our Receipt OCR & verification API serves business partners in those markets, with fiscal receipt verification against the state registry currently live in Ukraine.

How accurate is the receipt OCR?+

End-to-end accuracy across accepted receipt formats is consistently high. Primary text extraction runs on Google Cloud's Vertex AI vision services, with our own pipeline handling layout, line-item parsing, product matching and total reconciliation. Per-stage numbers vary by template and image quality, so we don't publish a single figure.

Does Costless sell my data?+

No. Costless does not sell user data to any third party, in any jurisdiction. Individual receipts are never shared externally, and your identity is never tied to any public-facing data product.

Built on data you can trust

See how we turn verified receipts and live price data into retail price intelligence for brands and retailers.

Explore Insights

Transparent retail-data methodology

Costless is a retail intelligence platform that verifies supermarket prices, deals and receipts across Ukraine, Canada, Lithuania, Poland and the United Kingdom, with a Receipt OCR & verification API for business partners and fiscal receipt verification currently live in Ukraine. Rather than relying on a single scraped feed, Costless combines automated parsers across 56 chains, ML-based OCR of 240,000+ real receipts, GPS-verified field collection and promo-banner OCR.Cross-source verification reconciles these streams, catching scraping errors, missed promotions and fraudulent submissions. Receipt text extraction is powered by Google Cloud Vertex AI vision services alongside an in-house pipeline for layout detection, line-item parsing, product matching and total reconciliation. Costless does not sell user data, retains receipts only while an account is active, and operates under GDPR, UK GDPR, CCPA and national data-protection laws. This methodology page is updated as our coverage and processing evolve.

expand ↓

To independently verify any claim on this page, contact [email protected] or [email protected]. We provide redacted samples and methodology walkthroughs to researchers, journalists and prospective partners under NDA where appropriate.