Glossing multi-line texts on the ZBB
Posted: Fri Dec 19, 2025 6:12 pm
Warning: This script doesn't fail gracefully if the romanization and the gloss have different numbers of words.
I couldn't figure out how to do this using Neonnaut’s gloss generator, so I wrote a simple Python script for glossing multi-line texts. Each line of the text must have exactly four lines: original, romanization, gloss and translation. I'm sure you can figure out which lines to comment out if you don't need them all.
Example input:
রাস্তায় গাড়িঘোড়ার বিরাম নাই,
rasta-i gaɽi-ɡʱoɽa-r biram nai
road.loc car-horse.gen rest is.none
On the road, the cars and horses have no rest
ফেরিওয়ালা অবিশ্রাম হাঁকিয়া চলিয়াছে,
pʰeriwala ɔbisram hãkija tʃolijatʃʰe
hawker no-rest call.part walk.3.perf.cont
the street hawkers keep walking, calling without rest,
যাহারা আপিসে কালেজে আদালতে যাইবে
dʒahara ɔpiʃ-e kɔledʒ-e ad̪alɔt-e dʒaib-e
those office.loc college.loc court.loc go.3.fut
those who will go to offices, colleges, law courts,
Corresponding output:
What it looks like on the ZBB:
রাস্তায় গাড়িঘোড়ার বিরাম নাই,
"On the road, the cars and horses have no rest"
ফেরিওয়ালা অবিশ্রাম হাঁকিয়া চলিয়াছে,
"the street hawkers keep walking, calling without rest,"
যাহারা আপিসে কালেজে আদালতে যাইবে
"those who will go to offices, colleges, law courts,"
---
If you use this tool, consider contributing to any of my conlang or natlang threads. Thank you.
Edit: I told a custom SLM to convert this script into a webpage: https://drive.google.com/file/d/1Yr2QnO ... sp=sharing I gave the SLM some inputs to test the script. It should be correct. Download the file and open it in your browser. (The SLM runs on my local machine, not a data center. If you are opposed to AIs in any capacity whatsoever regardless of environmental impact, then stick to the Python script above.)
I couldn't figure out how to do this using Neonnaut’s gloss generator, so I wrote a simple Python script for glossing multi-line texts. Each line of the text must have exactly four lines: original, romanization, gloss and translation. I'm sure you can figure out which lines to comment out if you don't need them all.
Code: Select all
#!/usr/bin/env python3
import sys
import io
# must have UTF-8
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
def batch_glosses(input):
lines = input.split('\n')
glosses = []
current_gloss = []
# Split input separated by blank lines
for line in lines:
if line.strip() == '':
if len(current_gloss) > 0:
glosses.append(current_gloss)
current_gloss = []
else:
current_gloss.append(line)
if len(current_gloss) > 0:
glosses.append(current_gloss)
# Loop through each gloss
results = []
for gloss in glosses:
if len(gloss) < 4:
results.append('[color=red]Each gloss needs 4 lines[/color]\n')
continue
original = gloss[0]
romanization = gloss[1]
gloss_line = gloss[2]
translation = gloss[3]
# Split
rom_words = romanization.split()
gloss_words = gloss_line.split()
#gloss tags
tagged_rom_parts = []
for i, rom_word in enumerate(rom_words):
gloss_word = gloss_words[i] if i < len(gloss_words) else ''
tagged_rom_parts.append(f'[gloss={gloss_word}]{rom_word}[/gloss]')
tagged_rom=' '.join(tagged_rom_parts)
quoted = translation if translation.startswith('"') else f'"{translation}"'
results.append(f'{original}\n{tagged_rom}\n{quoted}')
return '\n\n'.join(results)
def main():
if len(sys.argv) < 2:
print('Specify the input file')
sys.exit(1)
file = sys.argv[1]
try:
with open(file, 'r', encoding='utf-8') as f:
input = f.read()
output = batch_glosses(input)
print(output)
except FileNotFoundError:
print(f'Input file not found', file=sys.stderr)
sys.exit(1)
except Exception as e:
print(f'Error reading file: {str(e)}', file=sys.stderr)
sys.exit(1)
if __name__ == '__main__':
main()
রাস্তায় গাড়িঘোড়ার বিরাম নাই,
rasta-i gaɽi-ɡʱoɽa-r biram nai
road.loc car-horse.gen rest is.none
On the road, the cars and horses have no rest
ফেরিওয়ালা অবিশ্রাম হাঁকিয়া চলিয়াছে,
pʰeriwala ɔbisram hãkija tʃolijatʃʰe
hawker no-rest call.part walk.3.perf.cont
the street hawkers keep walking, calling without rest,
যাহারা আপিসে কালেজে আদালতে যাইবে
dʒahara ɔpiʃ-e kɔledʒ-e ad̪alɔt-e dʒaib-e
those office.loc college.loc court.loc go.3.fut
those who will go to offices, colleges, law courts,
Corresponding output:
Code: Select all
রাস্তায় গাড়িঘোড়ার বিরাম নাই,
[gloss=road.loc]rasta-i[/gloss] [gloss=car-horse.gen]gaɽi-ɡʱoɽa-r[/gloss] [gloss=rest]biram[/gloss] [gloss=is.none]nai[/gloss]
"On the road, the cars and horses have no rest"
ফেরিওয়ালা অবিশ্রাম হাঁকিয়া চলিয়াছে,
[gloss=hawker]pʰeriwala[/gloss] [gloss=no-rest]ɔbisram[/gloss] [gloss=call.part]hãkija[/gloss] [gloss=walk.3.perf.cont]tʃolijatʃʰe[/gloss]
"the street hawkers keep walking, calling without rest,"
যাহারা আপিসে কালেজে আদালতে যাইবে
[gloss=those]dʒahara[/gloss] [gloss=office.loc]ɔpiʃ-e[/gloss] [gloss=college.loc]kɔledʒ-e[/gloss] [gloss=court.loc]ad̪alɔt-e[/gloss] [gloss=go.3.fut]dʒaib-e[/gloss]
"those who will go to offices, colleges, law courts,"রাস্তায় গাড়িঘোড়ার বিরাম নাই,
- rasta-i
- road.loc
- gaɽi-ɡʱoɽa-r
- car-horse.gen
- biram
- rest
- nai
- is.none
"On the road, the cars and horses have no rest"
ফেরিওয়ালা অবিশ্রাম হাঁকিয়া চলিয়াছে,
- pʰeriwala
- hawker
- ɔbisram
- no-rest
- hãkija
- call.part
- tʃolijatʃʰe
- walk.3.perf.cont
"the street hawkers keep walking, calling without rest,"
যাহারা আপিসে কালেজে আদালতে যাইবে
- dʒahara
- those
- ɔpiʃ-e
- office.loc
- kɔledʒ-e
- college.loc
- ad̪alɔt-e
- court.loc
- dʒaib-e
- go.3.fut
"those who will go to offices, colleges, law courts,"
---
If you use this tool, consider contributing to any of my conlang or natlang threads. Thank you.
Edit: I told a custom SLM to convert this script into a webpage: https://drive.google.com/file/d/1Yr2QnO ... sp=sharing I gave the SLM some inputs to test the script. It should be correct. Download the file and open it in your browser. (The SLM runs on my local machine, not a data center. If you are opposed to AIs in any capacity whatsoever regardless of environmental impact, then stick to the Python script above.)