Wyno? a wine recommender system using transfer learning and semantic search

Francisco ESPIGA
5 min readMay 3, 2021

This article was originally published in Spanish on cienciadedatos.net

Overview

When I was outlining Wyno?, I wanted to join two of my passions, wine and Data Science, to find recommendations of new wines to taste based on similar expert critics’ opinions using semantic search.

The data is publicly available here and contains more than 120.000 tasting notes of different wines from the old and the new world from the Wine Enthusiast Magazine.

I used transfer learning to convert the tasting notes into vectors and then find the recommendations using cosine similarity.

Approach

On many occasions, we find surprising wines out of what we are used to and we would enjoy similar things. Looking for wines from the same terroir, grape variety, or their punctuation in worldwide respected lists, we can identify new candidates thanks to that. But, would not be fantastic if we could also take into account all this raw expert knowledge in our choice-making?

For that reason, I used the tasting notes as a starting point and transformed them using NLP in representative vectors of all the description or sentence embeddings to be able to compare the different wines and find close neighbors from what the critics thought of them.

Brief exploratory data analysis

To verify that the recommendations are diverse but also that we do not stumble on recommendations niches, I have chosen the top-10 countries and grape varieties to make sure that the wines are easily available.

number of tasting notes by country of origin

We observe that, although the US is the predominant country of origin for most wines in the dataset, they are balanced by wines coming from Old-world producing countries like Spain, Italy, or France. So we are confident that we can get recommendations to explore from both sides of the Atlantic.

number of tasting notes by grape variety

In terms of grape varieties, we have white wine and red wine grape varieties, so similarly as with the country of origin, we were not biased towards one type or the other.

Transfer learning for embedding generation

I have used a pre-trained model to transform the tasting notes, text data, into numerical vectors that allow us to find recommendations.

For this, I used Google Universal Sentence encoder, available on TensorFlow hub and that converts a phrase in English into a 512 dimensions vector or sentence embedding. In case we had notes in different languages, we could rely on other multilingual pre-trained models, but that was not our case.

As the computation of the embeddings is an expensive process and the underlying data (the tasting note) will not change, I stored the results in a tensor with size (tasting notes x 512).

We can check that the embeddings are normalized. This is important when computing the similarity to obtain recommendations.

Results

Once all the tasting notes have been transformed in vectors, we can use cosine similarity to find the nearest neighbors to a given one using this formula

cosine similarity

We will then transform that into a score following the STS benchmark for textual similarity recommendations.

Finding recommendations from a specific wine

We select a random sample, index 15 in our case which corresponds to a German Riesling. We can verify that the first recommendations are coherent, as either they have the same grape variety and origin or they come from the Alsace region in France, which has similar wines.

We also find interesting recommendations, like a New world Riesling, an Alsacian rosé with Pinot Noir and an Austrian Chardonnay.

scores, rank = get_recommendations(EMBEDDINGS[15,:], EMBEDDINGS, top_k = 10)

General recommendations

In case we want to find recommendations starting from a specific wine, it is easy to obtain candidates by similarity, but, what happens with more generic cases, such a region or a grape variety?

In this case, what we can do is to consider all the wines sharing those characteristics and average the vectors to obtain an archetypical candidate of all the wines for which recommendations will be close to all.

We have tested the behavior of Wyno? on three cases:

  • Old-world recommendations similar to an Australian Pinot Noir.
  • Non-Spanish recommendations for a Spanish red wine from D.O.Ca. Rioja.
  • Non-Italian nor French recommendations for Prosecco.

Australian Pinot Noir

We generate the average embedding of all candidate wines. We can verify that we obtain interesting results, most of them from the new world and the same variety. Others are more surprising, like the 2015 Austrian Meinklang of the Blauburgunder variety (another appelative for Pinot Noir).

Non-Spanish red similar to Rioja

In this case, we get also temperamental wines from South America, such as those from the Mendoza region.

Non-Italian nor French similar to Prosecco

This last case is astonishing because the top-10 are non-sparkling whites. We could argue that most of them have been excluded when we filtered out Italy and France, but it is due because sparkling appears rarely in the notes. Only 0.6% of the samples contain the term.

Visualizing resultsRioja

It is useful to obtain a list of recommendations given a terroir, variety, etc. but equally interesting is to be able to interactively explore the results.

This can be useful, for instance, for a blind tasting where we want to challenge the panel of experts.

For this, we will use T-SNE to reduce the dimensionality from the original 512 dimensions to 2. A sample of 5000 wines has been randomly chosen to make it visually and computationally efficient.

RiojaWine recommendation graph using t-SNE

In the 2D projection using T-SNE, it can be observed how wines tend to group by country, with some not so obvious discoveries such as the closeness of Spanish Toro, and Rioja wines with Chilean and Argentinian, as we saw in the examples or Portuguese wines from the Dão region and French.

Conclusions and next steps

The next logical step would be to deploy the recommendation system in the cloud to reach more users. Moreover, being able to color by region, variety, etc. would improve the usability of Wyno? and the insights it can bring to the tasting table.

In addition to that and thinking about the aforementioned blind tastings, creating a similarity graph and running a community detection algorithm could cluster similar wines by their reviews, enhancing the tasting, as most certainly being able to identify each wine would be an even more challenging experience.

--

--

Francisco ESPIGA

Data Science & AI Tech Lead@SANDOZ and Teacher@ESIC. Reinforcement learning, optimization and AI enthusiast building a more interesting world 1 epoch at a time.