How are we protecting the Internet from the flood of fake news, powered by Transformers?

  • Andrei Petreanu profile
    Andrei Petreanu
    19 August 2019 - updated 1 year ago
    Total votes: 4

Natural Language Generation tasks have the power to produce amazing human-form text on any given topic, and were considered,up until recently ,to be very hard tasks to manage. Transformers and Attention models changed that, and the first ones to give us a taste of what can come were OpenAI : 

"Better language Models and their Implications" : https://openai.com/blog/better-language-models/

Immediately after, as a response, Google showed off BERT (that incredible "make-a-rezervation" demo from Google I/O), and BERT is now considered to be the state of the art in Natural Language Understanding.

Google Bert and OpenAI GPT-2 tech was open-source (in terms of research, stats, small-pretrained models, code) but the entire power of those NLG systems was kept hidden from the public, to protect against what could become a 24/7 flood of generated fake news on different topics.

Last week, NVidia released Megatron Language Model, the biggest trained language model of its kind, with incredible generative capabilities. Link here : https://nv-adlr.github.io/MegatronLM. "largest transformer based language model ever trained at 24x the size of BERT and 5.6x the size of GPT-2"

This was a full public release, including code and train-protocol

That means anyone who's got a lot of GPU power can now generate infinite human text on any given topic, resembling something of a news-bomb :)

Snippet from OpenAI's concerns on this matter :

"We are aware that some researchers have the technical capacity to reproduce and open source our results. We believe our release strategy limits the initial set of organizations who may choose to do this, and gives the AI community more time to have a discussion about the implications of such systems.

We also think governments should consider expanding or commencing initiatives to more systematically monitor the societal impact and diffusion of AI technologies, and to measure the progression in the capabilities of such systems. If pursued, these efforts could yield a better evidence base for decisions by AI labs and governments regarding publication decisions and AI policy more broadly." May 2019

 

So what OpenAI has said in May is now public and available for everyone in April, by NVidia.

Are we not concerned? I'm training these models for work, as a Tech-Lead / Consultant.

I am very concerned.