Visualizing and Measuring the Geometry of BERT
07 Jun 2019, Prathyush SPTransformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally.
For more details, visit the source.