GPT-4chan is a sophisticated language model trained particularly on content from 4chan’s /pol/ board, a forum renowned for its politically incorrect discussions. Over the course of three and a half years, over 134.5 million posts from this forum were analysed to construct the model.

This meticulous procedure enabled the development of an AI that can imitate human-like /pol/ posts. GPT-4chan is based on the GPT-J 6B model and was refined using a dataset titled “Raiders of the Lost Kek,” which includes 3.5 years of /pol/ content. After only one epoch of training, GPT-4chan attempts to replicate text reflecting the distribution of its training data. It reveals the discourse of anonymous online communities such as 4chan. In addition, GPT-4chan exhibits promise in areas such as toxicity detection due to its distinctive training background. This model is useful for analysing and comprehending anonymous online political discussions.

GPT-4chan is useful for studying anonymous online communities, analyzing political behavior, toxicity detection, and analyzing subcultures, but should be approached cautiously due to controversial content.

