From PoC to MvP

Photo by Robert Anasch on Unsplash

Project recap

  • In the first article of the series, we created a PoC to validate our approach, using NLP. You can find all the necessary code in my github repository
  • In this second article of the series, we will build an app using streamlit as the front end, and fastAPI in the backend.
  • In the third article, we will migrate the data to an ElasticSearch database.
  • In the last article, we will create a telegram bot to ask for suggestions.
  • Hypothesis 1: we can use NLP to obtain product recommendations by similarity (books in our case).
  • Hypothesis 2: using a BERT-based model to compute the book description embedding and the query embedding is an adequate choice.
  • Hypothesis 3: regardless of the query (different languages or broader vs specific queries), the recommendations were ok.

Architecture choices

initial architecture

Back end

  • The first endpoint, get_recommendations, takes a set of book titles and/or ISBNs and composes a user library, then retrieves matches for that library by their similarity to each title description.
  • The second endpoint, get_recommendations_from_description, takes a query, and retrieves matches by similarity, relying on the embedding service.
embedding_service = “http://embeddings:8502"

Embedding generator

FROM tiangolo/uvicorn-gunicorn:python3.7RUN mkdir /fastapiCOPY requirements.txt /fastapiCOPY embeddings.p /fastapiWORKDIR /fastapiRUN pip install -r requirements.txtCOPY . /fastapiEXPOSE 8000CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8000"]

Front end

  • Option 1: creating a library using a list of ISBNs and book titles.
  • Option 2: writing a query.
streamlit front end
FROM python:3.7-slimRUN mkdir /streamlitCOPY requirements.txt /streamlitCOPY books_info.csv /streamlitWORKDIR /streamlitRUN pip install -r requirements.txtCOPY . /streamlitEXPOSE 8501CMD ["streamlit", "run", "ui.py"]

Connecting everything

sudo docker-compose build
sudo docker-compose up
sudo docker-compose up myservice

Wrapping up

--

--

--

Data Science consultant @EUIPO and Teacher @ESIC. Reinforcement learning, optimization and AI enthusiast building a more interesting world 1 epoch at a time.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Download Google Transliteration Input Method Ime For Mac

You Can’t Put Too Much Water into a Nuclear Reactor

images/bigpic/func-combined.png

Streams — Lazy Enumerables

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Francisco ESPIGA

Francisco ESPIGA

Data Science consultant @EUIPO and Teacher @ESIC. Reinforcement learning, optimization and AI enthusiast building a more interesting world 1 epoch at a time.

More from Medium

Pre-trained Language Models for Relational Data

How does the neurobiology of syntactic processing inform computational NLP?

Extracting Word Embedding & Sentence Embedding From BERT For Twitter Sentiment Analysis