From PoC to MvP

Photo by Robert Anasch on Unsplash

Project recap

The goal of the project and this series of articles is to walk the path of AI-based product creation, from the first baby steps to, progressively, building a more complex solution. Each stage of the journey will build on the previous one:

  • In the first article of the series, we created a PoC to validate our approach, using NLP. You can find all the necessary code in my github repository
  • In this second article of the series, we will build an app using streamlit as the front end, and fastAPI in the backend.
  • In the third article, we will migrate the data to an ElasticSearch database.
  • In the last article, we will create a telegram bot to ask for suggestions.
  • Hypothesis 1: we can use NLP to obtain product recommendations by similarity (books in our case).
  • Hypothesis 2: using a BERT-based model to compute the book description embedding and the query embedding is an adequate choice.
  • Hypothesis 3: regardless of the query (different languages or broader vs specific queries), the recommendations were ok.

Architecture choices

From our previous article, we are keeping the database for now. It is not ideal to have it in a jsonlines file, but this will soon change and it is manageable.

initial architecture

Back end

The broker currently has two different endpoints.

  • The first endpoint, get_recommendations, takes a set of book titles and/or ISBNs and composes a user library, then retrieves matches for that library by their similarity to each title description.
  • The second endpoint, get_recommendations_from_description, takes a query, and retrieves matches by similarity, relying on the embedding service.
embedding_service = “http://embeddings:8502"

Embedding generator

The embedding generator is another API implemented using fastAPI as well, that takes a query and returns the embedding.

FROM tiangolo/uvicorn-gunicorn:python3.7RUN mkdir /fastapiCOPY requirements.txt /fastapiCOPY embeddings.p /fastapiWORKDIR /fastapiRUN pip install -r requirements.txtCOPY . /fastapiEXPOSE 8000CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8000"]

Front end

I really like streamlit to prototype the front-end. It is fast, flexible, and allows us to iterate quickly. In my opinion, it is easier to set up compared to Dash, but it is a matter of personal taste and both are equally valid options for an MvP.

  • Option 1: creating a library using a list of ISBNs and book titles.
  • Option 2: writing a query.
streamlit front end
FROM python:3.7-slimRUN mkdir /streamlitCOPY requirements.txt /streamlitCOPY books_info.csv /streamlitWORKDIR /streamlitRUN pip install -r requirements.txtCOPY . /streamlitEXPOSE 8501CMD ["streamlit", "run", "ui.py"]

Connecting everything

We want to be able to spin up the stack at once, as well as map the different services relying on each other, seamlessly. For this, we will use docker-compose, which is a helpful functionality and allows us to later retrieve the different logs for each container, tracing, etc.

sudo docker-compose build
sudo docker-compose up
sudo docker-compose up myservice

Wrapping up

Now is time to wear our product-owner hats once again. After this stage, we have a running app where the user can input a set of titles and ISBNs or a description to retrieve matching suggestions.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Francisco ESPIGA

Francisco ESPIGA

56 Followers

Data Science & AI Tech Lead@SANDOZ and Teacher@ESIC. Reinforcement learning, optimization and AI enthusiast building a more interesting world 1 epoch at a time.