Is there a tutorial video on how to process text before upload it and how can I put it on csv file. I just saw a tutorial video but it’s for image not text
We are working on creating video example of how to use DLS on text dataset.
@vishal , Please share the video link here once it is ready.
Here is the video example for uploading the Text Dataset on DLS:
Please check the below reference script:
from keras.preprocessing.text import Tokenizer from keras.preprocessing.sequence import pad_sequences import csv text_file = open("reviews.txt", "r") lines = text_file.readlines() maxlen = 100 # We will cut reviews after 100 words training_samples = 200 # We will be training on 200 samples validation_samples = 10000 # We will be validating on 10000 samples max_words = 10000 # We will only consider the top 10,000 words in the dataset tokenizer = Tokenizer(num_words=max_words) tokenizer.fit_on_texts(lines) sequences = tokenizer.texts_to_sequences(lines) sameLengthSequences = pad_sequences(sequences, maxlen=maxlen) sequencesToStrings =  for row in sameLengthSequences: sequencesToStrings.append(';'.join(str(col) for col in row)) csvfile = "processed.csv" with open(csvfile, "w") as output: writer = csv.writer(output, lineterminator='\n') for val in sequencesToStrings: writer.writerow([val])
Do we need to train neural network every time when we get new data in dataset (new sentences) ?
I guess keras makes tokens from text every time on the different way with different tokens.
Is there a way that we only once train our neural network with some training data, and when we get some new data, we only tokenize that new data and feed it to neural network to get results.
As long as you don’t change your model configuration you can continue training on the new data by using the saved weight in the training tab.
I have some labeled sentences and suppose i processed it on this way and trained nn
If I later get unlabeled sentences and i need predictions for them, do I need to tokenize all data (old + new) and train nn again with labeled data or I need only to tokenize unlabeled sentences and feed it into nn and it will work fine
No you don’t have to train your nn again in this case.
To predict output on the trained data you can either generate inference by uploading unlabeled data in the inference tab and selecting the trained run or you can deploy the trained model and predict output for single tokenized sentence.