Named Entity Recognition with BERT in PyTorch We only provide a label to the first sub-word of each splitted token. The continuation of the sub-word then will simply have '-100' as a label. All tokens that don't have word_ids will also be labeled with '-100'. We provide the same label among all of the sub-words that belong to the same token.