We release bigram subnetworks as described in Chang and Bergen (2025). These are sparse subsets of model parameters that recreate bigram predictions (next token predictions conditioned only on the ...
As you've likely noticed by now, working with text data comes with a lot of ambiguity. When all we start with is an arbitrarily-sized string of words, there's no clear answer as to what sorts of ...