Many products today are filled with chemicals that we do not really want to ingest. The names of the chemicals do not really tell us much, since we aren’t much chemists. Sodium Chloride might sound horrible, but that’s acutally salt. Similarly, polyglycerol polyricinoleate sounds more like something Captain Kirk would say, if the script wirters would be able to pronounce such words. But again, this chemical is harmless, made out of soy beans, and usually used to reduce viscosity of the chocolate.
To battle this chemical illiteracy that most of us suffer from, European Union came up with the E numbers. These chemicals have been assessed for use within European Union by the European Food Safety Authority. Most of food additives have the E number code. That means that anyone may read the product label, get the E number codes and look them up in the database. Of course, no one says what or where the database is. And most people aren’t capable of doing any searching of this calibre at all. We have all the data. What we need is simplicity and convenience.
This what is needed in a nutshell:
- User walks in to the store and sees a product
- User takes out the mobile phone and opens up E-Food app
- User points the phone to the product
- Application returns all the food additives and chemicals that have been used when making this particular product, and sorts them based on the harm score.
- User can sort the returned food additives, read them, submit bug, or write a quick review of the product, updating the price in the process.
- Application will suggest similar products that contain less harmful chemicals (this is how the application will generate revenue).
The best and most certain way to realize this kind of application, is to do an OCR of the product picture, and find out the ingredients from the database. There are two major problems with this approach though:
- Most people will not think about taking picture of the small-print. People will tak a picture of either the front of the product, or the nutrition content table. Both of which are useless for the OCR.
- Small-print on the product may be too small for the phone camera
Therefore we need a more universal approach that will be able to deal with most of the situations. Sure, the best way would be to implement a photo recognition system, and match the product to the picture. But seeing how even Google is not able to implement that, I wouldn’t overcomplicate the idea when its still in infancy stage.
Here is how the logic could work:
- User would have to take the picture of barcode. Luckily, most products have that, and most prople know what it is.
- Barcode would be transmitted to the backend.
- Backend will identify product in the database using bar code information.
- If the product cannot be identified from barcode, it will be stored in the unrecognized table.
- Unrecognized table should be periodically taken care of.
- If the product recognized
- Get the data from the database
- Calculate the total harm score based on
- the most harmful additive score
- the sum of all additive scores above 0.
- Push the data to the user
- If the product NOT recognized
- Store the barcode information in the unrecognized table
- Unrecognized table will be periodically take care of, by automating a crawler
- Users can add custom information to the unrecognized products
- Registered user can add information to any product
High level diagram
There will be two databases.
- First database will index all known food additives, their full name (later will be used to translate to multiple languages), E-number, harm score, and other facts. This database will have all the known chemicals that might appear in the food. Weekly updates will make sure the database will always stay up to date. Primary sources will be the European Food Safety Authority and other trusted sources, like Emulgatory.cz
- Second database will index all the know products. Mandatory fields will include Brand, Name, Barcode and Food Additives. Optional parameters will include price, locations where people request the info from, and pictures. This database will be continuously updated based on the new information. Crawler will take care of getting new information from the internet or the manufacturer’s web site.
Since every E number is issued and documented by the EU, crawler needs to continuously update this information. Crawler also needs to find the product information and update it based on the source time stamp and source trustworthiness.
Advanced OCR and image recognition functions
With an evolution of the application, recognition will play an increasingly larger role. Wiki-style information population has enormous potential, but has very steep curves since a large user base is needed to acquire a critical mass, where enough users will contribute content for the whole system to function. Most of the users are readers, and that is why it is crucial to start with pre-populated information base. But as time progresses, OCR and image recognition will become needed in order to sustain application success. OCR would look for E numbers and chemical names in the pictures of products. Image recognition will try to identify the product, supplementing barcode function with an even simpler approach.