Preprocessing data using tokenization