MaxPoolBERT: Enhancing BERT Classification via Layer- and Token-Wise Aggregation
arXiv:2505.15696v2 Announce Type: replace-cross Abstract: The [CLS] token in BERT is commonly used as a fixed-length representation for classification tasks, yet prior work has shown that both other tokens and intermediate layers encode valuable contextual information. In this work, we…
