Text summarizer (based on BERT) as a service/API on fly #292

I was playing around with an open source summarizer for creating executive sumamry of a given text. It is a generalization of a solution based on a paper which uses BERT (by Google) to summarize lectures. Basically give it something that is around 2k words and ask it to make it 20% it will come back with sentences it thinks are important which would make it around 400+ words (roughly 20% depending on sentence lengths).

It has multiple use cases like creating a news summary service (something to summarize all the Corona news for instance) or summarize any long text you need to read with a ML algo.

As it has a docker container with the project which gives out a REST API with Flask, I can quickly build that and make it work on fly.io with fly specific instructions on how to do it. I think this will be a good addition to the examples.

From my previous experience, this 3.5 GB container needs a lot of resources (given the ML model it uses). It needs like 2 GB of RAM to run, just a heads up. On the bright side, this translates to doc on how to scale up services for a heavy and useful application. Thanks!

PS: I am not a Machine Learning enthusiast, I had to solve a problem for a side project and basic googling landed me to this project. I even evaluated Meaning Cloud API but this repo was better at summarizing and less cost with virtually no limit on number of calls :).

Looks reasonable as an outline.

I’d switch “Endless Possibilities” for “Possible Applications” and loosely suggest some ideas for things that would benefit from summarizing (news feeds, instructions, blog articles…) …

Possible bonus section, leverage puppeteer-js and get it to extract some text from a URL and feed it to the summarizer. And refer people to the guide for that too.

I’d avoid summarizing Wikipedia articles as encylopedias tend to be short statements of fact which are either overly easy to summarize or terribly complex. For the example paragraphs, how about summarizing a recent blog post at Fly like the Sandbox and Isolation one (or at least see how it comes out).

@Codepope or @kurt - Sorry to come back to this a bit late. I have updated the readme : https://github.com/geshan/bert-extractive-summarizer

I have some queries:

  1. Yes it is great to link up the puppeteer-js to the summarizer but it seems like an overkill, if you know of valid use case like summarizing long youtube video description it can be done.
  2. Please let me know of a text source to summarize.

The latest changes are here: https://github.com/geshan/bert-extractive-summarizer , I am open to feedback. Thanks!