CLARA - Functional Prototype — Standalone emulator demonstrating natural language processing with radical transparency. Every calculation is visible, every weight debatable, every decision well-founded. ← Back to CLARAClear
Step 1: Tokenization
▼Splitting text into individual words.
Text 1:
Text 2:
Step 2: Lemmatization
▼Reduction of words to their base form (lemma).
Text 1:
Text 2:
Step 3: Stopword Filtering
▼Removal of function words (articles, prepositions, etc.).
Text 1:
Text 2:
Step 4: TF-IDF Calculation
▼Term weighting according to frequency and rarity.
Text 1:
Text 2:
Step 5: Jaccard Similarity
▼Similarity measure based on word sets.
Intersection:
Union:
Formula: J(A,B) = |A ∩ B| / |A ∪ B|
Calculation:
Result:
Step 6: Cosine Similarity
▼Similarity measure based on TF-IDF vectors.
Common vocabulary:
Vector 1:
Vector 2:
Formula: cos(θ) = (A · B) / (||A|| × ||B||)
Calculation:
Result: