Conversion of the Atlas of Pidgin and Creole Languages language examples to Ligt model

As submitted to LDK-2021

Data is taken from: https://github.com/cldf-datasets/apics/releases , label v2013 (latest)

In [1]:
%cd './data/'
/Users/max/Projects/LiODi/ligt/stable/apics/data
In [2]:
# !pip install pycldf

Parsing APiCS

In [3]:
import pycldf
import csv
import re
In [4]:
apics_dataset = pycldf.Wordlist.from_metadata('StructureDataset-metadata.json')

A sample data item from the LanguageTable collection:

In [5]:
next(iter(apics_dataset['LanguageTable']))
Out[5]:
OrderedDict([('ID', '1'),
             ('Name', 'Early Sranan'),
             ('Macroarea', None),
             ('Latitude', Decimal('5.833333')),
             ('Longitude', Decimal('-55.6')),
             ('Glottocode', 'sran1240'),
             ('ISO639P3code', None),
             ('Description',
              'Over the years multiple historical documents in and on the English-base creole language of Suriname known as Sranan or Sranantongo have been uncovered, resulting in a substantial digitized corpus of eighteenth-century texts. These texts, stored in the Suriname Creole Archive at the Radboud University, Nijmegen, provide a unique window on the Sranan language as it was spoken in the eighteenth century, that is, at earlier stages of its development. In several historical sources phonological, grammatical, semantic, and pragmatic differences between varieties of the creole language are acknowledged, and some varieties appear to have been so different that they are known under distinct names. For example, in Schumann’s Sranan–German dictionary of 1783, we find references to Plantasi tongo (Plantation language), Foto tongo (City language), Ningre tongo (Black’s language) as well as English tongo (English language). Early Sranan is used as cover term for these 18th century creole varieties. Detailed comparisons of the historical creole data with their equivalents in contemporary varieties of Sranantongo and the Surinamese Maroon languages have further refined our understanding of language variation and language change in Early Sranan.  The default lect documented in <span style="font-style: italic;">APiCS</span> is the variety of Early Sranan that was used for interethnic out-group communication. When available, examples of diachronic, social, stylistic and geographical variation are included.'),
             ('Data_Contributor_ID', ['vandenbergmargotc', 'bruynadrienne']),
             ('Survey_Contributor_ID', ['vandenbergmargotc', 'smithnorvalsh']),
             ('Survey_Title',
              'Early Sranan. In "The survey of pidgin and creole languages". Volume 1: English-based and Dutch-based Languages'),
             ('Source',
              ['1044',
               '11',
               '1218',
               '1219',
               '1355',
               '1357',
               '1422',
               '1437',
               '1519',
               '1520',
               '1521',
               '1522',
               '1527',
               '1576',
               '181',
               '183',
               '185',
               '186',
               '187',
               '188',
               '190',
               '191',
               '192',
               '193',
               '194',
               '310',
               '313',
               '314',
               '449',
               '524',
               '53',
               '54',
               '55',
               '56',
               '625',
               '1876[survey]']),
             ('Ethnologue_Name', None),
             ('Glossed_Text_PDF', 'a9e3060ff8c20a905525e265c7c565b2'),
             ('Glossed_Text_Audio', None),
             ('Metadata',
              '{"Autoglossonym": "Ningri tongo (18th century)", "Other names": "Bastert Engels, Neeger Engels (18th c. Dutch; obsolete, pejorative)", "Number of speakers": "ca. 55,000 \\u00a0in 1783 (L1 as well as L2)", "Major lexifier": "English", "Other contributing languages": "European (Dutch, Portuguese, etc.); West African (Kwa, Kikongo, etc.); Amerindian (Arawakan; Cariban)", "Location": "Suriname", "Official language of 18th-century Suriname": "Dutch"}'),
             ('Region', 'Caribbean'),
             ('Default_Lect_ID', None),
             ('Lexifier', 'English')])
In [6]:
languages = {lang['ID']: (lang['Name'], lang['Glottocode'], lang['ISO639P3code']) for lang in apics_dataset['LanguageTable']}
In [7]:
languages['1']
Out[7]:
('Early Sranan', 'sran1240', None)

A sample example sentence:

In [8]:
next(iter(apics_dataset['ExampleTable']))
Out[8]:
OrderedDict([('ID', '1-1'),
             ('Language_ID', '1'),
             ('Primary_Text', 'Isredeh mi kau bringi wan mannpikin.'),
             ('Analyzed_Word',
              ['Isrede', 'mi', 'kau', 'bringi', 'wan', 'manpikin.']),
             ('Gloss',
              ['yesterday', '1SG', 'cow', 'deliver', 'a', 'male.young']),
             ('Translated_Text', 'Yesterday my cow delivered a bull calf.'),
             ('Meta_Language_ID', None),
             ('Comment', None),
             ('Source', ['1357[22]']),
             ('Audio', None),
             ('Type', 'written (dictionary)'),
             ('markup_text', 'Isredeh mi kau bringi wan mannpikin.'),
             ('markup_analyzed', 'Isrede mi kau bringi wan manpikin.'),
             ('markup_gloss', 'yesterday 1SG cow deliver a male.young'),
             ('markup_comment', None),
             ('source_comment', None),
             ('original_script', None),
             ('sort', '1'),
             ('alt_translation',
              'German: Gestern hat meine Kuh ein Junges, ein Oechsgen geworfen. [op.cit.]')])
In [9]:
def align_glosses(morphs, glosses, example):
    if len(morphs) != len(glosses):
        print(morphs, glosses, example)
    return list(zip(morphs, glosses))

There are some examples that don't have gloss/word splitting, we skip them for now

In [10]:
sent_not_split = [example for example in apics_dataset['ExampleTable'] if len(example['Analyzed_Word']) == 1 and ' ' in example['Analyzed_Word'][0]]

print(len(sent_not_split))
sent_not_split
58
Out[10]:
[OrderedDict([('ID', '75-8'),
              ('Language_ID', '75'),
              ('Primary_Text', 'la fiy opaapaawa'),
              ('Analyzed_Word', ['la fiy o-paapaa-wa']),
              ('Gloss', ['the.F (F) girl (F) 3.POSS-father-OBV (C)']),
              ('Translated_Text', "the girl's father"),
              ('Meta_Language_ID', None),
              ('Comment', 'F = from French; C = from Cree'),
              ('Source', []),
              ('Audio', None),
              ('Type', 'constructed by linguist'),
              ('markup_text', 'la fiy opaapaawa'),
              ('markup_analyzed', 'la fiy o-paapaa-wa'),
              ('markup_gloss', 'the.F (F) girl (F) 3.POSS-father-OBV (C)'),
              ('markup_comment', 'F = from French; C = from Cree'),
              ('source_comment', 'Own knowledge'),
              ('original_script', None),
              ('sort', '384'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '21-12'),
              ('Language_ID', '21'),
              ('Primary_Text', 'the way that acrolectal speakers say it'),
              ('Analyzed_Word', ['the way that acrolectal speakers say it']),
              ('Gloss', []),
              ('Translated_Text', 'the way that acrolectal speakers say it'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'constructed by linguist'),
              ('markup_text', 'the way that acrolectal speakers say it'),
              ('markup_analyzed', None),
              ('markup_gloss', None),
              ('markup_comment', None),
              ('source_comment', 'Own knowledge'),
              ('original_script', None),
              ('sort', '1028'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '24-17'),
              ('Language_ID', '24'),
              ('Primary_Text', 'Ai si aa gehl gwen a maeri Chaali.'),
              ('Analyzed_Word', ['Ai si aa gehl gwen a maeri Chaali.']),
              ('Gloss', ['1SG see DET.DEF.SG woman FUT marry Charley']),
              ('Translated_Text',
               'I saw the woman who is going to marry Charley.'),
              ('Meta_Language_ID', None),
              ('Comment', 'a may be a link vowel rather than a future marker'),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text', 'Ai si aa gehl gwen a maeri Chaali.'),
              ('markup_analyzed', None),
              ('markup_gloss', '1SG see DET.DEF.SG woman FUT marry Charley'),
              ('markup_comment',
               '<span style="font-style: italic;">a</span> may be a link vowel rather than a future marker'),
              ('source_comment', 'Own fieldwork'),
              ('original_script', None),
              ('sort', '1034'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '59-31'),
              ('Language_ID', '59'),
              ('Primary_Text',
               'ye so koli ti ni ake ga na ni, wala akobe so, wala akasa so, koli ti ni ake ga na ni so, tona lo tene, ni to amafuta n na mo mo te, il faut mo te ape'),
              ('Analyzed_Word',
               ['ye so koli ti ni ake ga na ni, wala akobe so, wala akasa so, koli ti ni ake ga na ni so, tongana lo tene, ni to amafuta ni na mo mo te, il faut mo te ape']),
              ('Gloss',
               ['thing REL husband of 1SG.LOG SM.COP come PREP DET or PL.food REL or PL.stews REL husband of 1SG.LOG SM.COP come PREP DET REL when 3SG say 1SG.LOG cook PL.fat DET PREP 2SG 2SG eat INTERDICTION 2SG eat NEG']),
              ('Translated_Text',
               'Whatever my husband should bring, whether different kinds of food, whether different kinds of stews, my husband should bring (something), when he says, "I\'ll cook different rich dishes for you to eat" you must not eat it.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', ['1326']),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'ye so koli ti ni ake ga na ni, wala akobe so, wala akasa so, koli ti ni ake ga na ni so, tona lo tene, ni to amafuta n na mo mo te, il faut mo te ape'),
              ('markup_analyzed',
               'ye so koli ti ni ake ga na ni, wala akobe so, wala akasa so, koli ti ni ake ga na ni so, tongana lo tene, ni to amafuta ni na mo mo te, il faut mo te ape'),
              ('markup_gloss',
               'thing REL husband of 1SG.LOG SM.COP come PREP DET or PL.food REL or PL.stews REL husband of 1SG.LOG SM.COP come PREP DET REL when 3SG say 1SG.LOG cook PL.fat DET PREP 2SG 2SG eat INTERDICTION 2SG eat NEG'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '1107'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '51-21'),
              ('Language_ID', '51'),
              ('Primary_Text', 'Joj ka alé Baspwent souvan.'),
              ('Analyzed_Word', ['Joj ka alé Baspwent souvan.']),
              ('Gloss', ['George go Basse-Pointe often']),
              ('Translated_Text', 'Geoge often goes to Basse-Pointe.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'Joj ka alé Baspwent <span style="font-weight: bold;">souvan</span>.'),
              ('markup_analyzed', None),
              ('markup_gloss', 'George go Basse-Pointe often'),
              ('markup_comment', None),
              ('source_comment', 'Own fieldwork'),
              ('original_script', None),
              ('sort', '1606'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '58-156'),
              ('Language_ID', '58'),
              ('Primary_Text',
               '1SG móno; 2SG ngé; 3SG.ANIM yándi; 3SG.INAN yó; 1PL béto; 2PL béno; 3PL bó'),
              ('Analyzed_Word',
               ['1SG móno; 2SG ngé; 3SG.ANIM yándi; 3SG.INAN yó; 1PL béto; 2PL béno; 3PL bó']),
              ('Gloss', []),
              ('Translated_Text', None),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'constructed by linguist'),
              ('markup_text',
               '<span style="font-weight: bold;">1SG </span>móno; <span style="font-weight: bold;">2SG </span>ngé; <span style="font-weight: bold;">3SG.ANIM </span>yándi; <span style="font-weight: bold;">3SG.INAN </span>yó; <span style="font-weight: bold;">1PL </span>béto; <span style="font-weight: bold;">2PL </span>béno; <span style="font-weight: bold;">3PL </span>bó'),
              ('markup_analyzed', None),
              ('markup_gloss', None),
              ('markup_comment', None),
              ('source_comment', 'own knowledge'),
              ('original_script', None),
              ('sort', '2220'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '66-20'),
              ('Language_ID', '66'),
              ('Primary_Text',
               'go/(se) go; lu/lorang; dia/de/inçian/inçe; kitang/kitampəðə; lorangpəðə/lompəðə; derang/dempəðə'),
              ('Analyzed_Word',
               ['go/(se) go; lu/lorang; dia/de/inçian/inçe; kitang/kitampəðə; lorangpəðə/lompəðə; derang/dempəðə']),
              ('Gloss', ['1SG 2SG 3SG 1PL 2PL 3PL']),
              ('Translated_Text', 'I; you; he/she/it; we; you; they'),
              ('Meta_Language_ID', None),
              ('Comment',
               'This paradigm is for Kirinda usage, in which se is rare.'),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'go/(se) go; lu/lorang; dia/de/inçian/inçe; kitang/kitampəðə; lorangpəðə/lompəðə; derang/dempəðə'),
              ('markup_analyzed', None),
              ('markup_gloss', '1SG 2SG 3SG 1PL 2PL 3PL'),
              ('markup_comment',
               'This paradigm is for Kirinda usage, in which <span style="font-style: italic;">se</span> is rare.'),
              ('source_comment', 'Recorded by author'),
              ('original_script', None),
              ('sort', '2231'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '60-21'),
              ('Language_ID', '60'),
              ('Primary_Text', 'na-, o- , a- , e-; to-, bo-, ba-, e-'),
              ('Analyzed_Word', ['na-, o- , a- , e-; to-, bo-, ba-, e-']),
              ('Gloss',
               ['1SG 2SG 3SG.ANIM 3SG.INAN 1PL 2PL 3PL.ANIM 3PL.INAN']),
              ('Translated_Text', 'I, you, s/he, it; we, you, they, they'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', ['1273']),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text', 'na-, o- , a- , e-; to-, bo-, ba-, e-'),
              ('markup_analyzed', None),
              ('markup_gloss',
               '1SG 2SG 3SG.ANIM 3SG.INAN 1PL 2PL 3PL.ANIM 3PL.INAN'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '2392'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '40-35'),
              ('Language_ID', '40'),
              ('Primary_Text', 'Jhiko ankodz?'),
              ('Analyzed_Word', ['Jhiko ankodz?']),
              ('Gloss', ['PST-be/become-PST']),
              ('Translated_Text', 'Did something happen?'),
              ('Meta_Language_ID', None),
              ('Comment',
               'Jhiko is derived from ya hiko, which in turn comes from Portuguese já ficou [already become.PST].'),
              ('Source', []),
              ('Audio', None),
              ('Type', 'constructed by linguist'),
              ('markup_text',
               'Jhiko <span style="font-weight: bold;">ankodz</span>?'),
              ('markup_analyzed', None),
              ('markup_gloss', 'PST-be/become-PST'),
              ('markup_comment',
               '<span style="font-style: italic;">Jhiko</span> is derived from <span style="font-style: italic;">ya hiko</span>, which in turn comes from Portuguese <span style="font-style: italic;">já ficou</span> [already become.PST].'),
              ('source_comment', 'Own knowledge'),
              ('original_script', None),
              ('sort', '2998'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '49-366'),
              ('Language_ID', '49'),
              ('Primary_Text', 'M wè yon moun nan lakou ou la yè swa.'),
              ('Analyzed_Word', ['M wè yon moun nan lakou ou la yè swa.']),
              ('Gloss', []),
              ('Translated_Text', 'I saw someone in your yard last night.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'constructed by linguist'),
              ('markup_text', 'M wè yon moun nan lakou ou la yè swa.'),
              ('markup_analyzed', None),
              ('markup_gloss', None),
              ('markup_comment', None),
              ('source_comment', 'Own knowledge'),
              ('original_script', None),
              ('sort', '3031'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '7-290'),
              ('Language_ID', '7'),
              ('Primary_Text', 'Hi pikni dem bad.'),
              ('Analyzed_Word', ['Hi pikni dem bad.']),
              ('Gloss', ['3SG child and 3PL bad']),
              ('Translated_Text', 'His children are bad.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'constructed by linguist'),
              ('markup_text', 'Hi pikni dem bad.'),
              ('markup_analyzed', None),
              ('markup_gloss', '3SG child and 3PL bad'),
              ('markup_comment', None),
              ('source_comment', 'Own knowledge'),
              ('original_script', None),
              ('sort', '3537'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '56-72'),
              ('Language_ID', '56'),
              ('Primary_Text', 'Sa enn pye anba la ti bon groser koko'),
              ('Analyzed_Word', ['Sa enn pye anba la ti bon groser koko']),
              ('Gloss', ['DEM one tree down there PST good coconut']),
              ('Translated_Text',
               'This coconut tree over there had very thick coconuts.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', ['158[92]']),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               '<span style="font-weight: bold;">Sa enn</span> pye anba la ti bon groser koko'),
              ('markup_analyzed', None),
              ('markup_gloss', 'DEM one tree down there PST good coconut'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '4516'),
              ('alt_translation',
               "French: Ce cocotier là-bas avait des noix d'une bonne grosseur. (Bollée & Rosalie 1994: 93)")]),
 OrderedDict([('ID', '12-69'),
              ('Language_ID', '12'),
              ('Primary_Text',
               "You see, they ain't like these young people these days, they don't want work nohow, not these day. You see, they ain't want work nohow. [...] Them days, when I was small, we want work. In these day - them days, these - they - they don't want work."),
              ('Analyzed_Word',
               ['[...] these days [...] these day [...] Them days [...].']),
              ('Gloss',
               ['[...] DEM day.PL [...] DEM day [PL] [...] DEM day.PL [...]']),
              ('Translated_Text',
               '[...] young people today, they don’t want to work [...] In those days, when I was small, we wanted to work [...].'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'You see, they ain\'t like these young people <span style="font-weight: bold;">these </span>days, they don\'t want work nohow, not <span style="font-weight: bold;">these </span>day. You see, they ain\'t want work nohow. [...] <span style="font-weight: bold;">Them </span>days, when I was small, we want work. In <span style="font-weight: bold;">these </span>day - <span style="font-weight: bold;">them </span>days, these - they - they don\'t want work.'),
              ('markup_analyzed',
               '[...]<span style="font-weight: bold;"> these </span>days [...] <span style="font-weight: bold;">these </span>day [...] <span style="font-weight: bold;">Them </span>days [...].'),
              ('markup_gloss',
               '[...] DEM day.PL [...] DEM day [PL] [...] DEM day.PL [...]'),
              ('markup_comment', None),
              ('source_comment', 'Own sociolinguistic interviews'),
              ('original_script', None),
              ('sort', '4586'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '36-169'),
              ('Language_ID', '36'),
              ('Primary_Text',
               'Ka pê taya kôôndja lêtu fia e ki ũa-ũa taminha e.'),
              ('Analyzed_Word',
               ['Ka pê taya kôôndja lêtu fia e ki ũa-ũa taminha e.']),
              ('Gloss',
               ['PST put coconut inside leaf DEM with one-one bowl DEM']),
              ('Translated_Text',
               'They put slices of coconut in the [banana] leaves with every bowl.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'Ka pê taya kôôndja lêtu fia e ki ũa-ũa taminha e.'),
              ('markup_analyzed', None),
              ('markup_gloss',
               'PST put coconut inside leaf DEM with one-one bowl DEM'),
              ('markup_comment', None),
              ('source_comment', 'Maurer 1995'),
              ('original_script', None),
              ('sort', '4721'),
              ('alt_translation',
               'Ils mirent des tranches de noix de coco dans les feuilles avec chacun des bols.')]),
 OrderedDict([('ID', '26-139'),
              ('Language_ID', '26'),
              ('Primary_Text', 'naɪn taims vs. da naɪnt wan'),
              ('Analyzed_Word', ['naɪn taim-s vs. da naɪn-t wan']),
              ('Gloss', ['nine time-PL the nine-ORD one']),
              ('Translated_Text', 'nine times vs. the ninth one'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'unspecified'),
              ('markup_text', 'naɪn taims vs. da naɪnt wan'),
              ('markup_analyzed', 'naɪn taim-s vs. da naɪn-t wan'),
              ('markup_gloss', 'nine time-PL the nine-ORD one'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '4824'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '37-52'),
              ('Language_ID', '37'),
              ('Primary_Text',
               'ũa/pimyô, dôsu/sêgundu, têêxi/têsêw ~ trisêw, xinku/kintu'),
              ('Analyzed_Word',
               ['ũa/pimyô, dôsu/sêgundu, têêxi/têsêw ~ trisêw, xinku/kintu']),
              ('Gloss', ['one/first two/second three/third, five/fifth']),
              ('Translated_Text',
               'one/first, two/second, three/third, five/fifth'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', ['905']),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'ũa/pimyô, dôsu/sêgundu, têêxi/têsêw ~ trisêw, xinku/kintu'),
              ('markup_analyzed', None),
              ('markup_gloss', 'one/first two/second three/third, five/fifth'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '4846'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '46-86'),
              ('Language_ID', '46'),
              ('Primary_Text', "Ya-bené'le ayér."),
              ('Analyzed_Word', ["Ya-bené'le ayér."]),
              ('Gloss', ['PFV-come s/he yesterday']),
              ('Translated_Text', 'S/he came yesterday.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'constructed by linguist'),
              ('markup_text', "Ya-bené'le ayér."),
              ('markup_analyzed', None),
              ('markup_gloss', 'PFV-come s/he yesterday'),
              ('markup_comment', None),
              ('source_comment', 'Own knowledge'),
              ('original_script', None),
              ('sort', '5906'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '53-175'),
              ('Language_ID', '53'),
              ('Primary_Text', 'Mo te pa fe aryen.'),
              ('Analyzed_Word', ['Mo te pa fe aryen.']),
              ('Gloss', ['1SG PST NEG PROG do anything']),
              ('Translated_Text', "I wasn't doing anything."),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', ['722[258]']),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text', 'Mo te pa fe aryen.'),
              ('markup_analyzed', None),
              ('markup_gloss', '1SG PST NEG PROG do anything'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '5929'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '12-102'),
              ('Language_ID', '12'),
              ('Primary_Text', "I did 'pose to 'pear in court."),
              ('Analyzed_Word', ["I did 'pose to 'pear in court."]),
              ('Gloss', ['1SG.SBJ do.PST MOD.AUX appear in court']),
              ('Translated_Text', 'I was supposed to appear in court.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'I <span style="font-weight: bold;">did \'pose to</span> \'pear in court.'),
              ('markup_analyzed', None),
              ('markup_gloss', '1SG.SBJ do.PST MOD.AUX appear in court'),
              ('markup_comment', None),
              ('source_comment', 'Own sociolinguistic interviews'),
              ('original_script', None),
              ('sort', '6012'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '19-233'),
              ('Language_ID', '19'),
              ('Primary_Text', 'À dè lɛf nà Luba soté dì nɛks wik.'),
              ('Analyzed_Word', ['À dè lɛf nà Luba soté dì nɛks wik.']),
              ('Gloss', ['1SG PROG LOC Luba until DEF.ART next week.']),
              ('Translated_Text',
               'I will be staying in Luba until next week.'),
              ('Meta_Language_ID', None),
              ('Comment',
               '"I am staying in Luba until (the) next week" - note that the person is\r\nnot yet in Luba when he says this, he is  still in Malabo.'),
              ('Source', []),
              ('Audio', None),
              ('Type', 'unspecified'),
              ('markup_text', 'À dè lɛf nà Luba soté dì nɛks wik.'),
              ('markup_analyzed', None),
              ('markup_gloss', '1SG PROG LOC Luba until DEF.ART next week.'),
              ('markup_comment',
               '"I am staying in Luba until (the) next week" - note that the person is\r\nnot yet in Luba when he says this, he is  still in Malabo.'),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '6273'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '56-104'),
              ('Language_ID', '56'),
              ('Primary_Text',
               'Ou pa pou trouv li ankor zanmen. I pou al dan en lot kan.'),
              ('Analyzed_Word',
               ['Ou pa pou trouv li ankor zanmen. I pou al dan en lot kan.']),
              ('Gloss',
               ['2SG NEG FUT see 3SG never 3PL FUT go in a other aerie']),
              ('Translated_Text',
               'You will never see them [in the same place]. They will go to another aerie.'),
              ('Meta_Language_ID', None),
              ('Comment',
               'Li [3SG] is interpreted as a plural pronoun in this example.'),
              ('Source', ['158[194]']),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'Ou pa pou trouv li ankor zanmen. I pou al dan en lot kan.'),
              ('markup_analyzed', None),
              ('markup_gloss',
               '2SG NEG FUT see 3SG never 3PL FUT go in a other aerie'),
              ('markup_comment',
               '<span style="font-style: italic;">Li</span> [3SG] is interpreted as a plural pronoun in this example.'),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '6768'),
              ('alt_translation',
               'French: Vous ne les reverrez jamais [au même endroit]. Ils iront dans une autre aire. (Bollée & Rosalie 1994: 195)')]),
 OrderedDict([('ID', '2-129'),
              ('Language_ID', '2'),
              ('Primary_Text', 'Mi kan go na+a dansi bika(si) mi abi moni.'),
              ('Analyzed_Word',
               ['Mi kan go na+a dansi bika(si) mi abi moni.']),
              ('Gloss',
               ['1SG can go LOC the.SG dance because 1SG have money']),
              ('Translated_Text',
               'I can go to the dance because I have money.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'elicited from speaker'),
              ('markup_text', 'Mi kan go na+a dansi bika(si) mi abi moni.'),
              ('markup_analyzed', None),
              ('markup_gloss',
               '1SG can go LOC the.SG dance because 1SG have money'),
              ('markup_comment', None),
              ('source_comment', 'Winford transcripts'),
              ('original_script', None),
              ('sort', '7176'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '4-197'),
              ('Language_ID', '4'),
              ('Primary_Text', 'A sa e wooko nownouw.'),
              ('Analyzed_Word', ['A sa e wooko nownouw.']),
              ('Gloss', ['3SG can IPFV now']),
              ('Translated_Text', 'He might be working now.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text', 'A sa e wooko nownouw.'),
              ('markup_analyzed', None),
              ('markup_gloss', '3SG can IPFV now'),
              ('markup_comment', None),
              ('source_comment', 'Own knowledge'),
              ('original_script', None),
              ('sort', '7180'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '1-146'),
              ('Language_ID', '1'),
              ('Primary_Text', '[...] va gi da moni na potiman.'),
              ('Analyzed_Word', ['[...] fu gi da moni na pôtiman.']),
              ('Gloss', ['[...] for give DET.SG na poor-NMLZ']),
              ('Translated_Text', '[...] to give the money to the poor.'),
              ('Meta_Language_ID', None),
              ('Comment',
               'The order is (S-)V-T-R, with the general preposition na marking the recipient.'),
              ('Source', ['1355[209]']),
              ('Audio', None),
              ('Type', 'written'),
              ('markup_text', '[...] va gi da moni na potiman.'),
              ('markup_analyzed', '[...] fu gi da moni na pôtiman.'),
              ('markup_gloss', '[...] for give DET.SG na poor-NMLZ'),
              ('markup_comment',
               'The order is (S-)V-T-R, with the general preposition <span style="font-style: italic;">na</span> marking the recipient.'),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '7995'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '24-130'),
              ('Language_ID', '24'),
              ('Primary_Text', "lorng / lorng a' / lorng f'"),
              ('Analyzed_Word', ["lorng / lorng a' / lorng f'"]),
              ('Gloss', ['along.PREP']),
              ('Translated_Text', 'with / together with'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text', "lorng / lorng a' / lorng f'"),
              ('markup_analyzed', None),
              ('markup_gloss', 'along.PREP'),
              ('markup_comment', None),
              ('source_comment', 'Own fieldwork'),
              ('original_script', None),
              ('sort', '9220'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '12-187'),
              ('Language_ID', '12'),
              ('Primary_Text',
               "[...] his wife is a pastor. I forget what they church name. They church in Freeport. She's a pastor in Freeport."),
              ('Analyzed_Word',
               ["[...] his wife is a pastor. I forget what they church name. They church in Freeport. She's a pastor in Freeport."]),
              ('Gloss',
               ['[...] 3SG.M.POSS wife COP ART pastor 3PL.POSS church in Freeport 3SG.F.COP ART pastor in Freeport']),
              ('Translated_Text',
               '[...] his wife is a pastor. [I can’t remember the name of their church.] Their church is in Freeport. She’s a pastor in Freeport.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               "[...] his wife is a pastor. I forget what they church name. They church in Freeport. She's a pastor in Freeport."),
              ('markup_analyzed', None),
              ('markup_gloss',
               '[...] 3SG.M.POSS wife COP ART pastor 3PL.POSS church in Freeport 3SG.F.COP ART pastor in Freeport'),
              ('markup_comment', None),
              ('source_comment', 'Own sociolinguistic interviews'),
              ('original_script', None),
              ('sort', '10084'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '21-103'),
              ('Language_ID', '21'),
              ('Primary_Text', 'I got gold ring.'),
              ('Analyzed_Word', ['I got gold ring.']),
              ('Gloss', ['1SG have DET gold ring']),
              ('Translated_Text', 'I have a gold ring.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'constructed by linguist'),
              ('markup_text', 'I got gold ring.'),
              ('markup_analyzed', None),
              ('markup_gloss', '1SG have DET gold ring'),
              ('markup_comment', None),
              ('source_comment', 'Own knowledge'),
              ('original_script', None),
              ('sort', '10187'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '2-211'),
              ('Language_ID', '2'),
              ('Primary_Text',
               'Yu yere dati Turku e suku yu, a betre yu wijk uit komoto na Holland.'),
              ('Analyzed_Word',
               ['Yu yere dati Turku e suku yu, a betre yu wijk uit komoto na Holland.']),
              ('Gloss',
               ['you hear that Turks IPFV look.for you be better you emigrate come.out of Holland']),
              ('Translated_Text',
               'You hear that the Turks are looking for you, you better get out of Holland.'),
              ('Meta_Language_ID', None),
              ('Comment',
               'I SUSPECT THAT FU IS ALSO USED IN SVCS WITH A MOTION VERB AS V1 AND ‘GO/COME’ ETC. AS V2. YOU ALSO HAVE STRUCTURES LIKE this one, WHICH ARE PROBABLY NOT EXAMPLES OF COMING FROM A PLACE IN THE STRICT SENSE.'),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'Yu yere dati Turku e suku yu, a betre yu wijk uit <span style="font-weight: bold;">komoto na </span>Holland.'),
              ('markup_analyzed', None),
              ('markup_gloss',
               'you hear that Turks IPFV look.for you be better you emigrate come.out of Holland'),
              ('markup_comment',
               'I SUSPECT THAT <span style="font-style: italic;">FU</span> IS ALSO USED IN SVCS WITH A MOTION VERB AS V1 AND ‘GO/COME’ ETC. AS V2. YOU ALSO HAVE STRUCTURES LIKE this one, WHICH ARE PROBABLY NOT EXAMPLES OF COMING FROM A PLACE IN THE STRICT SENSE.'),
              ('source_comment', 'Winford data, Tape 8-a'),
              ('original_script', None),
              ('sort', '10550'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '46-149'),
              ('Language_ID', '46'),
              ('Primary_Text', "Kyére'le salé na ágwa."),
              ('Analyzed_Word', ["Kyére'le salé na ágwa."]),
              ('Gloss', ['want s/he quit/get.out LOC water']),
              ('Translated_Text', 'S/he wants to get out of the water.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'Kyére\'le salé <span style="font-weight: bold;">na</span> ágwa.'),
              ('markup_analyzed', None),
              ('markup_gloss', 'want s/he quit/get.out LOC water'),
              ('markup_comment', None),
              ('source_comment', 'Own knowledge'),
              ('original_script', None),
              ('sort', '10738'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '75-208'),
              ('Language_ID', '75'),
              ('Primary_Text',
               'Kiishipweepayiwak laasphinaen dan la bal oma eeituhteechik da li gran palae.'),
              ('Analyzed_Word',
               ['Kii-shipwee-payi-w-ak aasphinaen dan la bal oma ee-ituhteechik da li gran palae.']),
              ('Gloss',
               ['PST-leave-MOVE-3-PL away LOC DEF.ART.F.SG that COMP-go-3PL LOC DEF.ART.M.SG big palace']),
              ('Translated_Text',
               'And they took off, away to the ball. They went to the big palace.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', ['522']),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'Kiishipweepayiwak laasphinaen dan la bal oma eeituhteechik <span style="font-weight: bold;">da</span> li gran palae.'),
              ('markup_analyzed',
               'Kii-shipwee-payi-w-ak aasphinaen dan la bal oma ee-ituhteechik da li gran palae.'),
              ('markup_gloss',
               'PST-leave-MOVE-3-PL away LOC DEF.ART.F.SG that COMP-go-3PL LOC DEF.ART.M.SG big palace'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '10842'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '55-153'),
              ('Language_ID', '55'),
              ('Primary_Text', "mo'nn tir mo lakle depi dan mo sak"),
              ('Analyzed_Word', ["mo 'nn tir mo lakle depi dan mo sak"]),
              ('Gloss',
               ['1SG.COMPL take.out POSS.1SG key ABL LOC POSS.1SG bag']),
              ('Translated_Text', 'I took the key out of my bag.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', ['770']),
              ('Audio', None),
              ('Type', 'constructed by native-speaker linguist'),
              ('markup_text', "mo'nn tir mo lakle depi dan mo sak"),
              ('markup_analyzed', "mo 'nn tir mo lakle depi dan mo sak"),
              ('markup_gloss',
               '1SG.COMPL take.out POSS.1SG key ABL LOC POSS.1SG bag'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '11081'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '21-116'),
              ('Language_ID', '21'),
              ('Primary_Text', 'Kong Kong send them go school.'),
              ('Analyzed_Word', ['Kong Kong send them go school.']),
              ('Gloss', ['grandfather send 3PL go school']),
              ('Translated_Text', 'Grandfather sends/sent them to school.'),
              ('Meta_Language_ID', None),
              ('Comment',
               'This is more basilectal Singlish, cf. Kong Kong send them to school.'),
              ('Source', []),
              ('Audio', None),
              ('Type', 'constructed by linguist'),
              ('markup_text', 'Kong Kong send them go school.'),
              ('markup_analyzed', None),
              ('markup_gloss', 'grandfather send 3PL go school'),
              ('markup_comment',
               'This is more basilectal Singlish, cf. <span style="font-style: italic;">Kong Kong send them to school</span>.'),
              ('source_comment', 'Own knowledge'),
              ('original_script', None),
              ('sort', '11143'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '54-187'),
              ('Language_ID', '54'),
              ('Primary_Text',
               '[...] ou pran in pti koton sitronèl, ou mèt par ann-dan ou amar par isi.'),
              ('Analyzed_Word',
               ['[...] ou pran en pti koton sitronel, ou met par ann-dan, ou amar par isi.']),
              ('Gloss',
               ['[...] 2SG take INDF little stalk citronelle 2SG put inside 2SG fasten here']),
              ('Translated_Text',
               '[...] you take a little stalk of sitronel (Cymbopogon citratus), you put it inside, you fasten it here.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', ['229[54]']),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               '[...] ou pran in pti koton sitronèl, ou mèt par ann-dan ou amar par isi.'),
              ('markup_analyzed',
               '[...] ou pran en pti koton sitronel, ou met par ann-dan, ou amar par isi.'),
              ('markup_gloss',
               '[...] 2SG take INDF little stalk citronelle 2SG put inside 2SG fasten here'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '11255'),
              ('alt_translation',
               "French: [...] tu prends une petite tige de citronelle, tu la mets dedans, tu l'attaches ici.")]),
 OrderedDict([('ID', '47-165'),
              ('Language_ID', '47'),
              ('Primary_Text', 'Bo ta sintibo manera ta na bo lugar bo ta?'),
              ('Analyzed_Word',
               ['Bo ta sintibo manera ta na bo lugar bo ta?']),
              ('Gloss',
               ['2SG TNS feel 2SG as.if COP LOC POSS.2SG place 2SG COP']),
              ('Translated_Text',
               'Do you feel at home (lit. Do you feel yourself as if it is in your place (that) you are)?'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', ['755']),
              ('Audio', None),
              ('Type', 'naturalistic written'),
              ('markup_text', 'Bo ta sintibo manera ta na bo lugar bo ta?'),
              ('markup_analyzed', None),
              ('markup_gloss',
               '2SG TNS feel 2SG as.if COP LOC POSS.2SG place 2SG COP'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '11451'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '75-221'),
              ('Language_ID', '75'),
              ('Primary_Text',
               'anima la maenzon eeoshikaateek kayaash dan langleteer'),
              ('Analyzed_Word',
               ['anima la maenzon ee-oshikaateek kayaash dan langleteer']),
              ('Gloss',
               ['that.INAN DEF.ART.F.SG COMP-build.INAN-PASS.INAN long.time.ago LOC England']),
              ('Translated_Text',
               'a big stone house that was built a long time ago in England'),
              ('Meta_Language_ID', None),
              ('Comment',
               'This is an inanimate passive without expression of agent.'),
              ('Source', ['522']),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'anima la maenzon eeoshikaateek kayaash dan langleteer'),
              ('markup_analyzed',
               'anima la maenzon ee-oshikaateek kayaash dan langleteer'),
              ('markup_gloss',
               'that.INAN DEF.ART.F.SG COMP-build.INAN-PASS.INAN long.time.ago LOC England'),
              ('markup_comment',
               'This is an inanimate passive without expression of agent.'),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '11895'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '75-228'),
              ('Language_ID', '75'),
              ('Primary_Text', 'Namoya, niya mun peer kaapeekiyokeet.'),
              ('Analyzed_Word', ['Namoya, niya mun peer kaa-pee-kiyokee-t.']),
              ('Gloss',
               ['no (C) 1SG (C) 1SG.POSS (F) father (F) COMP-come-visit.TA-3  (C)']),
              ('Translated_Text',
               'No, MY father came to visit. OR: It was my father who came for a visit.'),
              ('Meta_Language_ID', None),
              ('Comment', 'F = from French; C = from Cree'),
              ('Source', ['94']),
              ('Audio', None),
              ('Type', 'constructed by linguist'),
              ('markup_text', 'Namoya, niya mun peer kaapeekiyokeet.'),
              ('markup_analyzed', 'Namoya, niya mun peer kaa-pee-kiyokee-t.'),
              ('markup_gloss',
               'no (C) 1SG (C) 1SG.POSS (F) father (F) COMP-come-visit.TA-3  (C)'),
              ('markup_comment', 'F = from French; C = from Cree'),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '12077'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '2-241'),
              ('Language_ID', '2'),
              ('Primary_Text',
               'Dus dan a man kon ferstan taki na frow san a wasi.'),
              ('Analyzed_Word',
               ['Dus dan a man kon ferstan taki na frow san a wasi.']),
              ('Gloss',
               ['thus then the man come understand COMP be the woman that he wash']),
              ('Translated_Text',
               'So then the man understood that it was the woman whom he had washed.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'Dus dan a man kon ferstan taki na frow <span style="font-weight: bold;">san</span> a wasi.'),
              ('markup_analyzed', None),
              ('markup_gloss',
               'thus then the man come understand COMP be the woman that he wash'),
              ('markup_comment', None),
              ('source_comment', 'Winford data, Tape 8-a'),
              ('original_script', None),
              ('sort', '12085'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '75-233'),
              ('Language_ID', '75'),
              ('Primary_Text',
               'Dan li boor ashiweepineew oonhin kaaniimiyit avek ana la fiy ana.'),
              ('Analyzed_Word',
               ['Dan li boor ashi-weepin-eew oonhin kaa-niim-iyi-t avek ana la fiy ana.']),
              ('Gloss',
               ['LOC ART.M.SG side away-throw-3.SBJ.3.OBJ DEM.ANIM REL-dance-OBV-3 with DEM.ANIM that ART.F.SG girl DEM.ANIM']),
              ('Translated_Text',
               'He threw the girl aside that he was dancing with. OR: He threw her aside, the one he was dancing with, that girl.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', ['522']),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'Dan li boor ashiweepineew oonhin kaaniimiyit avek ana la fiy ana.'),
              ('markup_analyzed',
               'Dan li boor ashi-weepin-eew oonhin kaa-niim-iyi-t avek ana la fiy ana.'),
              ('markup_gloss',
               'LOC ART.M.SG side away-throw-3.SBJ.3.OBJ DEM.ANIM REL-dance-OBV-3 with DEM.ANIM that ART.F.SG girl DEM.ANIM'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '12227'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '51-155'),
              ('Language_ID', '51'),
              ('Primary_Text', "kouto-a ou ka koupé pen épi'y la"),
              ('Analyzed_Word', ["kouto-a ou ka koupé pen épi'y la"]),
              ('Gloss', ['knife-DEF 2SG cut bread with 3SG']),
              ('Translated_Text', 'the knife with which you cut bread'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text', "kouto-a ou ka koupé pen épi'y la"),
              ('markup_analyzed', None),
              ('markup_gloss', 'knife-DEF 2SG cut bread with 3SG'),
              ('markup_comment', None),
              ('source_comment', 'Own fieldwork'),
              ('original_script', None),
              ('sort', '12310'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '6-101'),
              ('Language_ID', '6'),
              ('Primary_Text',
               'Me been tink say da da Jackass way been da call he picninni.'),
              ('Analyzed_Word',
               ['Me been tink say da da Jackass way been da call he picninni.']),
              ('Gloss',
               ['1SG ANT think COMP Jackass REL ANT call 3SG.OBJ child']),
              ('Translated_Text',
               'I thought that was the Jackass which was calling its child.'),
              ('Meta_Language_ID', None),
              ('Comment',
               'This example is from a written text from 1845. The past marker been is archaic in Trinidad English Creole (Winer 2009: 83), but still present in basilectal Tobagonian Creole (Winer 2009: 83; James & Youssef 2008: 673)'),
              ('Source', ['1591[24]']),
              ('Audio', None),
              ('Type', 'naturalistic written'),
              ('markup_text',
               'Me been tink <span style="font-weight: bold;">say</span> da da Jackass way been da call he picninni.'),
              ('markup_analyzed', None),
              ('markup_gloss',
               '1SG ANT think COMP Jackass REL ANT call 3SG.OBJ child'),
              ('markup_comment',
               'This example is from a written text from 1845. The past marker <span style="font-style: italic;">been</span> is archaic in Trinidad English Creole (Winer 2009: 83), but still present in basilectal Tobagonian Creole (Winer 2009: 83; James &amp; Youssef 2008: 673)'),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '12350'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '19-187'),
              ('Language_ID', '19'),
              ('Primary_Text',
               'Mì grànmá tɛl mi se è wɔnt go sìdɔ́n nà pueblo.'),
              ('Analyzed_Word',
               ['Mì grànmá tɛl mi se è wɔnt go sìdɔ́n nà pueblo.']),
              ('Gloss',
               ['1SG.POSS grandmother tell QUOT 3SG.SBJ want go sit LOC village']),
              ('Translated_Text',
               'My grandmother told me that she wants to go live in the village.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'Mì grànmá tɛl mi <span style="font-weight: bold;">se</span> è wɔnt go sìdɔ́n nà pueblo.'),
              ('markup_analyzed', None),
              ('markup_gloss',
               '1SG.POSS grandmother tell QUOT 3SG.SBJ want go sit LOC village'),
              ('markup_comment', None),
              ('source_comment', 'Field data'),
              ('original_script', None),
              ('sort', '12385'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '55-191'),
              ('Language_ID', '55'),
              ('Primary_Text', 'zweṅ ziska nov/ oktob novam <? noṅ ?>'),
              ('Analyzed_Word', ['zweṅ ziska nov/ oktob novam <? noṅ ?>']),
              ('Gloss', ['June till October/November no']),
              ('Translated_Text', 'From June to October/November, no?'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', ['760']),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text', 'zweṅ ziska nov/ oktob novam &lt;? noṅ ?&gt;'),
              ('markup_analyzed', None),
              ('markup_gloss', 'June till October/November no'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '13425'),
              ('alt_translation',
               "French: De juin jusqu'en octobre/novembre, non?")]),
 OrderedDict([('ID', '18-197'),
              ('Language_ID', '18'),
              ('Primary_Text',
               'Ma fada don dai. – Ashia (with sucking the teeth).'),
              ('Analyzed_Word',
               ['Ma fada don dai. – Ashia (with sucking the teeth).']),
              ('Gloss', ['1SG.POSS father PFV die – ashia']),
              ('Translated_Text', 'My father died. – I feel very sorry.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'constructed by linguist'),
              ('markup_text',
               'Ma fada don dai. – Ashia (with sucking the teeth).'),
              ('markup_analyzed', None),
              ('markup_gloss', '1SG.POSS father PFV die – ashia'),
              ('markup_comment', None),
              ('source_comment', 'Own knowledge'),
              ('original_script', None),
              ('sort', '13890'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '29-215'),
              ('Language_ID', '29'),
              ('Primary_Text',
               'Speaker_A: Is jy siek?  – Speaker_B: Tsk [either alveolar or lateral]'),
              ('Analyzed_Word',
               ['Speaker_A: Is jy siek?  – Speaker_B: Tsk [either alveolar or lateral]']),
              ('Gloss', ['Speaker_A: be 2SG  ill  – Speaker_B: Tsk']),
              ('Translated_Text',
               "Speaker A: Are you ill?  – Speaker B: Stupid question! / Can't you see? I am visibly ill/fine."),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'elicited from speaker'),
              ('markup_text',
               'Speaker_A: Is jy siek?  – Speaker_B: Tsk [either alveolar or lateral]'),
              ('markup_analyzed', None),
              ('markup_gloss', 'Speaker_A: be 2SG  ill  – Speaker_B: Tsk'),
              ('markup_comment', None),
              ('source_comment', 'Own data'),
              ('original_script', None),
              ('sort', '13896'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '68-127'),
              ('Language_ID', '68'),
              ('Primary_Text', '[tsk]'),
              ('Analyzed_Word',
               ['(ingressive single click, tongue behind alveolar ridge)']),
              ('Gloss', ['tsk']),
              ('Translated_Text', '(indicates disappointment or frustration)'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'constructed by linguist'),
              ('markup_text', '[tsk]'),
              ('markup_analyzed',
               '(ingressive single click, tongue behind alveolar ridge)'),
              ('markup_gloss', 'tsk'),
              ('markup_comment', None),
              ('source_comment', 'Own knowledge'),
              ('original_script', None),
              ('sort', '13916'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '6-131'),
              ('Language_ID', '6'),
              ('Primary_Text',
               'a:ks, kya:dz, liks, krips, plums /mz/, gloves /vz/, hands /nz/,'),
              ('Analyzed_Word',
               ['a:ks, kya:dz, liks, krips, plums /mz/, gloves /vz/, hands /nz/,']),
              ('Gloss', ['ask cards beatings crisp plums gloves hands']),
              ('Translated_Text',
               'ask, cards, beatings,  crisp, plums, gloves hands'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'unspecified'),
              ('markup_text',
               'a:ks, kya:dz, liks, krips, plums /mz/, gloves /vz/, hands /nz/,'),
              ('markup_analyzed', None),
              ('markup_gloss', 'ask cards beatings crisp plums gloves hands'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '14117'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '75-282'),
              ('Language_ID', '75'),
              ('Primary_Text', 'Nimiyaastaen lii ruuz eewiihkimaakwahki.'),
              ('Analyzed_Word',
               ['Ni-miyaast-aen lii ruuz ee-wiihkimaakw-ahki.']),
              ('Gloss',
               ['1-smell.INAN-3OBJ ART.PL COMP-smell.good-3PL.INAN-PL']),
              ('Translated_Text',
               'I smell the fragrance of roses. OR: I smell it if the roses spread a nice smell.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', ['789[103]']),
              ('Audio', None),
              ('Type', 'naturalistic written'),
              ('markup_text', 'Nimiyaastaen lii ruuz eewiihkimaakwahki.'),
              ('markup_analyzed',
               'Ni-miyaast-aen lii ruuz ee-wiihkimaakw-ahki.'),
              ('markup_gloss',
               '1-smell.INAN-3OBJ ART.PL COMP-smell.good-3PL.INAN-PL'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '14652'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '59-365'),
              ('Language_ID', '59'),
              ('Primary_Text', 'vuko/voko (H L tones); vuko/voko (L M tones)'),
              ('Analyzed_Word',
               ['vuko/voko (H L tones); vuko/voko (L M tones)']),
              ('Gloss', ['be.dark/black/blue/green.etc.']),
              ('Translated_Text', 'be dark, black, blue, green, etc.'),
              ('Meta_Language_ID', None),
              ('Comment',
               'There are three basic colours, as there are in other Ubangian languages. Colours are differentiated with ideophones and by other means, but there are few ideophones in Sango.'),
              ('Source', ['172[352]']),
              ('Audio', None),
              ('Type', 'unknown'),
              ('markup_text', 'vuko/voko (H L tones); vuko/voko (L M tones)'),
              ('markup_analyzed', None),
              ('markup_gloss', 'be.dark/black/blue/green.etc.'),
              ('markup_comment',
               'There are three basic colours, as there are in other Ubangian languages. Colours are differentiated with ideophones and by other means, but there are few ideophones in Sango.'),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '14768'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '75-289'),
              ('Language_ID', '75'),
              ('Primary_Text',
               'Li maal pi la femel karibuu lii korn ayaaweewak.'),
              ('Analyzed_Word',
               ['Li maal pi la femel karibuu lii korn ayaaweewak.']),
              ('Gloss',
               ['DEF.ART.M.SG male and DEF.ART.F.SG female cariboo have.ANIM-3-PL']),
              ('Translated_Text', 'Male and female caribou have antlers.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', ['789[56]']),
              ('Audio', None),
              ('Type', 'naturalistic written'),
              ('markup_text',
               'Li maal pi la femel karibuu lii korn ayaaweewak.'),
              ('markup_analyzed',
               'Li maal pi la femel karibuu lii korn ayaaweewak.'),
              ('markup_gloss',
               'DEF.ART.M.SG male and DEF.ART.F.SG female cariboo have.ANIM-3-PL'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '14914'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '34-189'),
              ('Language_ID', '34'),
              ('Primary_Text', 'dé [empty coda]'),
              ('Analyzed_Word', ['dé [empty coda]']),
              ('Gloss', ['hurt']),
              ('Translated_Text', 'hurt'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'constructed by native-speaker linguist'),
              ('markup_text', 'dé [empty coda]'),
              ('markup_analyzed', None),
              ('markup_gloss', 'hurt'),
              ('markup_comment', None),
              ('source_comment', 'Own knowledge'),
              ('original_script', None),
              ('sort', '15276'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '47-236'),
              ('Language_ID', '47'),
              ('Primary_Text', 'kaska (LH melody); kaska (HL melody)'),
              ('Analyzed_Word', ['kaska (LH melody); kaska (HL melody)']),
              ('Gloss', ['peel.V peel.NOUN']),
              ('Translated_Text', 'peel (verb); peel (noun)'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', ['694[354]']),
              ('Audio', None),
              ('Type', 'both elicited and naturalistic spoken'),
              ('markup_text', 'kaska (LH melody); kaska (HL melody)'),
              ('markup_analyzed', 'kaska (LH melody); kaska (HL melody)'),
              ('markup_gloss', 'peel.V peel.NOUN'),
              ('markup_comment', None),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '15446'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '59-378'),
              ('Language_ID', '59'),
              ('Primary_Text', 'samba (L L); samba (L M)'),
              ('Analyzed_Word', ['samba (L L); samba (L M)']),
              ('Gloss', ['beer co-wife']),
              ('Translated_Text', 'beer; co-wife'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'constructed by linguist'),
              ('markup_text', 'samba (L L); samba (L M)'),
              ('markup_analyzed', None),
              ('markup_gloss', 'beer co-wife'),
              ('markup_comment', None),
              ('source_comment', 'Own knowledge'),
              ('original_script', None),
              ('sort', '15452'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '59-379'),
              ('Language_ID', '59'),
              ('Primary_Text', 'kwa (L tone); kwa (M tone); kwa (H tone)'),
              ('Analyzed_Word', ['kwa (L tone); kwa (M tone); kwa (H tone)']),
              ('Gloss', ['work hair/feather death/corpse']),
              ('Translated_Text', 'work; hair/feather; death/corpse'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'constructed by linguist'),
              ('markup_text', 'kwa (L tone); kwa (M tone); kwa (H tone)'),
              ('markup_analyzed', None),
              ('markup_gloss', 'work hair/feather death/corpse'),
              ('markup_comment', None),
              ('source_comment', 'Own knowledge'),
              ('original_script', None),
              ('sort', '15453'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '59-380'),
              ('Language_ID', '59'),
              ('Primary_Text', 'mene (L L)'),
              ('Analyzed_Word', ['mene (L L)']),
              ('Gloss', ['swallow']),
              ('Translated_Text', 'to swallow'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text', 'mene (L L)'),
              ('markup_analyzed', None),
              ('markup_gloss', 'swallow'),
              ('markup_comment', None),
              ('source_comment', 'Samarin corpus 1994'),
              ('original_script', None),
              ('sort', '15454'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '59-381'),
              ('Language_ID', '59'),
              ('Primary_Text', 'mene (H M)'),
              ('Analyzed_Word', ['mene (H M)']),
              ('Gloss', ['blood']),
              ('Translated_Text', 'blood'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text', 'mene (H M)'),
              ('markup_analyzed', None),
              ('markup_gloss', 'blood'),
              ('markup_comment', None),
              ('source_comment', 'Samarin corpus 1994'),
              ('original_script', None),
              ('sort', '15455'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '24-93'),
              ('Language_ID', '24'),
              ('Primary_Text', "Yu ort a' bii bin ya."),
              ('Analyzed_Word', ["Yu ort a' bii bin ya."]),
              ('Gloss', ['2SG ought COMP PST here']),
              ('Translated_Text', 'You should have been here.'),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic written'),
              ('markup_text', "Yu ort a' bii bin ya."),
              ('markup_analyzed', None),
              ('markup_gloss', '2SG ought COMP PST here'),
              ('markup_comment', None),
              ('source_comment', 'Own fieldwork'),
              ('original_script', None),
              ('sort', '15569'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '75-214'),
              ('Language_ID', '75'),
              ('Primary_Text', 'Namoya, niya nipaapaa kaapeekiyokeet.'),
              ('Analyzed_Word', ['Namoya, niya ni-paapaa kaa-pee-kiyokee-t.']),
              ('Gloss',
               ['no (C) 1SG (C) 1SG-father (C) COMP-come-visit.TR.ANIM-3 (C)']),
              ('Translated_Text', 'No, MY father came to visit.'),
              ('Meta_Language_ID', None),
              ('Comment',
               "Pee means 'in this direction'. \r\nF = from French; C = from Cree."),
              ('Source', ['522']),
              ('Audio', None),
              ('Type', 'elicited from speaker'),
              ('markup_text', 'Namoya, niya nipaapaa kaapeekiyokeet.'),
              ('markup_analyzed', 'Namoya, niya ni-paapaa kaa-pee-kiyokee-t.'),
              ('markup_gloss',
               'no (C) 1SG (C) 1SG-father (C) COMP-come-visit.TR.ANIM-3 (C)'),
              ('markup_comment',
               '<span style="font-style: italic;">Pee</span> means \'in this direction\'. \r\nF = from French; C = from Cree.'),
              ('source_comment', None),
              ('original_script', None),
              ('sort', '15805'),
              ('alt_translation', None)]),
 OrderedDict([('ID', '2-318'),
              ('Language_ID', '2'),
              ('Primary_Text',
               'Yu musu abi wan sani fu e gi en fu a hori en srefi bezig.'),
              ('Analyzed_Word',
               ['Yu musu abi wan sani fu e gi en fu a hori en srefi bezig.']),
              ('Gloss',
               ['2SG must have one thing to IPFV give 3SG for 3SG keep herself busy']),
              ('Translated_Text',
               "You must have something to keep giving her so that she can keep busy'."),
              ('Meta_Language_ID', None),
              ('Comment', None),
              ('Source', []),
              ('Audio', None),
              ('Type', 'naturalistic spoken'),
              ('markup_text',
               'Yu musu abi wan sani fu e gi en fu a hori en srefi bezig.'),
              ('markup_analyzed', None),
              ('markup_gloss',
               '2SG must have one thing to IPFV give 3SG for 3SG keep herself busy'),
              ('markup_comment', None),
              ('source_comment', 'Winford transcripts'),
              ('original_script', None),
              ('sort', '15808'),
              ('alt_translation', None)])]
In [11]:
sent_uri_template = 'http://apics-online.info/sentences/{ID}'

examples = [{'id': example['ID'], 
             'orig_id': sent_uri_template.format(ID=example['ID']), 
             'baseline': example['Primary_Text'],
             'glosses': align_glosses(example['Analyzed_Word'], example['Gloss'], example),
             'translation': example['Translated_Text'],
             'language': languages[example['Language_ID']],
             'meta_language': example['Meta_Language_ID'],
             'comment': example['Comment']
            } for example in apics_dataset['ExampleTable'] if not (len(example['Analyzed_Word']) == 1 and ' ' in example['Analyzed_Word'][0])]
In [12]:
len(examples)
Out[12]:
18468
In [13]:
for item in examples[:20]:
    print (list(item['glosses']))
[('Isrede', 'yesterday'), ('mi', '1SG'), ('kau', 'cow'), ('bringi', 'deliver'), ('wan', 'a'), ('manpikin.', 'male.young')]
[('Da', 'DET.SG'), ('masra', 'master'), ('teki', 'take'), ('mi', '1SG'), ('wefi', 'wife'), ('na', 'in'), ('neti', 'night'), ('nanga', 'with'), ('tranga', 'strong'), ('ai.', 'eye')]
[('A', 'DET'), ('mama', 'mother'), ('fon', 'beat'), ('a', 'DET'), ('pikin.', 'child')]
[('A', 'DET'), ('boi', 'boy'), ('lobi', 'love'), ('a', 'DET'), ('umapikin.', 'girl')]
[('A', '3SG'), ('téi', 'take'), ('dí', 'DEF.SG'), ('páu', 'stick'), ('páá.', 'quick')]
[('Wojo', 'eye'), ('u', 'for'), ('mi', '1SG'), ('á', 'NEG'), ('sa', 'M'), ('kai', 'fall'), ('ku', 'with'), ('di', 'DEF.SG'), ('faja.', 'fire')]
[('Di', 'DEF.SG'), ('mujɛɛ', 'woman'), ('naki', 'hit'), ('di', 'DEF.SG'), ('womi.', 'man')]
[('Den', 'DET.PL'), ('pikinnenge', 'child'), ('e', 'IPFV'), ('lobi', 'love/like'), ('switi', 'sweet'), ('sii.', 'seeds')]
[('kooknot', 'coconut'), ('bring', 'bring.forth'), ('ail', 'oil')]
[('Shi', '3SG'), ('buy', 'buy'), ('a', 'DET'), ('nju', 'new'), ('cyar.', 'car')]
[('Sita', 'Sita'), ('eat', 'eat'), ('di', 'DET'), ('mango.', 'mango')]
[('Di', 'DET'), ('child', 'child'), ('want', 'want'), ('food.', 'food')]
[('Mi', '1SG'), ('si', 'see'), ('di', 'ART'), ('man.', 'man')]
[('Meiri', 'Mary'), ('duhz', 'HAB'), ('aal-taim', 'all-time'), ('kis', 'kiss'), ('Jan.', 'John')]
[('Di', 'ART'), ('hed-tiicha', 'head-teacher'), ('duhz', 'HAB'), ('giv', 'give'), ('dem', '3PL'), ('lesn-z.', 'lesson-PL')]
[('Kien', 'Cain'), ('kil', 'kill'), ('Iebl.', 'Abel')]
[('Jimi', 'Jimmy'), ('fayn', 'find'), ('di', 'ART'), ('kru.', 'crew')]
[('Beda', 'Brother'), ('Ginihen', 'Guineahen'), ('tek', 'take'), ('wan', 'ART.INDF'), ('rod.', 'rod')]
[('An', 'and'), ('horikien', 'hurricane'), ('mash', 'mash'), ('dat', 'DEM'), ('dong.', 'down')]
[('Dis', 'DEM'), ('hat', 'hot'), ('man', 'man'), ('waahn', 'want'), ('aal', 'all'), ('di', 'ART.DEF'), ('gyal', 'girl'), ('dem.', 'PL')]

Creating RDF

In [14]:
import rdflib
from rdflib.namespace import RDF, RDFS, OWL, DC, DCTERMS
from rdflib.term import URIRef, Literal

Reading Glottolog

In [17]:
glottolog = rdflib.Graph()
glottolog.parse('glottolog_language.n3', format='n3')

lexvo = rdflib.Namespace('http://lexvo.org/ontology#')
In [18]:
def get_iso_code(glottocode):
    glottolog_template = 'http://glottolog.org/resource/languoid/id/{lang_id}'
    glottocode_uri = URIRef(glottolog_template.format(lang_id=glottocode))
    
    return glottolog.value(subject=glottocode_uri, predicate=lexvo.iso639P3PCode)

Creating APiCS graph

In [19]:
g = rdflib.Graph(identifier='http://purl.org/liodi/ligt/apics')
In [20]:
sentences = rdflib.Namespace('http://apics-online.info/sentences/')
apics = rdflib.Namespace('http://purl.org/liodi/ligt/apics/')
ligt = rdflib.Namespace('http://purl.org/ligt/ligt-0.2#')
nif = rdflib.Namespace('http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#')
In [21]:
g.bind('ligt', ligt)
g.bind('nif', nif)
g.bind('rdfs', RDFS)
g.bind('owl', OWL)
g.bind('dct', DCTERMS)

g.bind('apics', apics)
apics_doc = URIRef(apics)
In [24]:
g.set((apics_doc, RDF.type, ligt.Document))
g.set((apics_doc, ligt.hasUtterances, apics.examples))

g.set((apics.examples, RDF.type, ligt.InterlinearCollection))
# Probably it should also be a `dc:bibliographicCitation`
g.set((apics.examples, RDFS.comment, Literal(apics_dataset.properties['dc:bibliographicCitation'], lang="en")))
In [25]:
print(g.serialize(format='turtle').decode('utf-8'))
@prefix apics: <http://purl.org/liodi/ligt/apics/> .
@prefix ligt: <http://purl.org/ligt/ligt-0.2#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

apics: a ligt:Document ;
    ligt:hasUtterances apics:examples .

apics:examples a ligt:InterlinearCollection ;
    rdfs:comment "Michaelis, Susanne Maria & Maurer, Philippe & Haspelmath, Martin & Huber, Magnus (eds.) 2013. Atlas of Pidgin and Creole Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology."@en .


In [26]:
examples[165]
Out[26]:
{'id': '72-1',
 'orig_id': 'http://apics-online.info/sentences/72-1',
 'baseline': 'Jintaku karungku i bin gedim kengkaru mirlarrangyawung.',
 'glosses': [('Jintaku', 'one'),
  ('karu-ngku', 'child-ERG'),
  ('i', '3SG.SBJ'),
  ('bin', 'PST'),
  ('ged-im', 'shoot-TR'),
  ('kengkaru', 'kangaroo'),
  ('mirlarrang-yawung.', 'spear-COM')],
 'translation': 'One kid got the kangaroo with a spear.',
 'language': ('Gurindji Kriol', 'guri1249', None),
 'meta_language': None,
 'comment': 'The pronoun-verb order is SVO, as is the nominal-verb order.'}
In [27]:
def split_morphs(gloss):
    morphs = gloss[0].split('-')
    glosses = gloss[1].split('-')
    
    if len(morphs) == len(glosses) and len(glosses) > 1:
        return list(zip(morphs, glosses))
    
    return [gloss]
In [28]:
split_morphs(examples[48]['glosses'][4])
Out[28]:
[('kat', 'cut'), ('im', 'TR'), ('bat', 'PROG')]
In [29]:
lang_template = '{lexvo}-x-{glottolog}'
glottolog_template = 'http://glottolog.org/resource/languoid/id/{lang_id}'

for example in examples:
    lang_lexvo = example['language'][2] if example['language'][2] else get_iso_code(example['language'][1])
    lang = lang_template.format(lexvo=lang_lexvo, glottolog=example['language'][1]) if lang_lexvo else ''
    
    # Utterance node
    ex = apics + URIRef('ex_{}'.format(example['id']))
    g.add((apics.examples, ligt.subSegment, ex))
    
    # Utterance properties
    g.add((ex, RDF.type, ligt.Utterance))
    g.add((ex, OWL.sameAs, URIRef(example['orig_id'])))
    g.add((ex, RDFS.label, Literal(example['baseline'], lang=lang)))
    if example['comment']:
        g.add((ex, RDFS.comment, Literal(example['comment'], lang="en")))
    g.add((ex, ligt.translation, Literal(example['translation'], lang="en")))
    
    # Utterance metadata
    g.add((ex, DCTERMS.language, URIRef(glottolog_template.format(lang_id=example['language'][1]))))
    
    # Tiers
    ex_tier_phrase = URIRef('{}_tier_phrase'.format(ex))
    ex_tier_morphs = URIRef('{}_tier_morphs'.format(ex))
    ex_tier_words = URIRef('{}_tier_words'.format(ex))
    
    g.add((ex, ligt.hasTier, ex_tier_phrase))
    g.add((ex, ligt.hasMorphs, ex_tier_morphs))
    g.add((ex, ligt.hasWords, ex_tier_words))
    
    # Phrase
    phrase = URIRef('{}_item_phrase_1'.format(ex))
    g.add((ex_tier_phrase, RDF.type, ligt.Tier))
    g.add((ex_tier_phrase, ligt.item, phrase))
    
    # Glosses
    
    if len(example['glosses']):
        next_word = URIRef('{}_item_word_{}'.format(ex, 1))
        
    for i, gloss in enumerate(example['glosses']):
        word = next_word
        next_word = URIRef('{}_item_word_{}'.format(ex, i+2)) if i < len(example['glosses']) - 1 else None
        
        g.add((ex_tier_words, ligt.item, word))
        g.add((word, RDF.type, ligt.Word))
        g.add((word, nif.subString, phrase))
        g.add((word, RDFS.label, Literal(gloss[0].strip('\\.,'), lang=lang)))
        
        if next_word:
            g.add((word, ligt.next, next_word))
        
        next_morph = URIRef('{}_item_morph_{}_{}'.format(ex, i+1, 1))
        subglosses = split_morphs(gloss)
        
        for j, subgloss in enumerate(subglosses):
            morph = next_morph
            next_morph = URIRef('{}_item_morph_{}_{}'.format(ex, i+1, j+2)) if j < len(subglosses) - 1 else None
            
            g.add((ex_tier_morphs, ligt.item, morph))
            g.add((morph, RDF.type, ligt.Morph))
            g.add((morph, nif.subString, word))
            g.add((morph, RDFS.label, Literal(subgloss[0].strip('\\.,'), lang=lang)))
            g.add((morph, ligt.gloss, Literal(subgloss[1].strip('\\.,'), lang="en")))
            
            if next_morph:
                g.add((morph, ligt.next, next_morph))
In [30]:
g.serialize(format='turtle', destination='../apics_ligt.ttl', encoding='utf-8')

Mapping

We will use the list of labels from the dataset:

In [34]:
gloss_abbr = {}

with open('glossabbreviations.csv') as inp_file:
    inp_file.readline()
    for line in inp_file:
        tag, val = line.strip('\n\r,').split(',', 1)
        gloss_abbr[tag] = val
In [35]:
gloss_abbr
Out[35]:
{'ACC': 'accusative',
 'ADV': 'adverb(ial)',
 'FUT': 'future',
 'BEN': 'benefactive',
 'REL': 'relative',
 'COP': 'copula',
 'NEG': '"negation, negative"',
 'TOP': 'topic',
 'FOC': 'focus',
 'DET': 'determiner',
 'QUOT': 'quotative',
 'VOC': 'vocative',
 'ERG': 'ergative',
 'COND': 'conditional',
 'DEM': 'demonstrative',
 'INTR': 'intransitive',
 'PTCP': 'participle',
 'DIST': 'distal',
 'PASS': 'passive',
 'DU': 'dual',
 'COM': 'comitative',
 'AGR': 'agreement',
 'DEF': 'definite',
 'EXCL': 'exclusive',
 'DAT': 'dative',
 'PRF': 'perfect',
 'IRR': 'irrealis',
 'ANTIP': 'antipassive',
 'IPFV': 'imperfective',
 'CVB': 'converb',
 'SBJV': 'subjunctive',
 'TR': 'transitive',
 'PROG': 'progressive',
 'CAUS': 'causative',
 'RECP': 'reciprocal',
 'APPL': 'applicative',
 'Q': 'question particle/marker',
 'PRS': 'present',
 'RES': 'resultative',
 'INCL': 'inclusive',
 'GEN': 'genitive',
 'PL': 'plural',
 'A': 'agent-like argument of canonical transitive verb',
 'OBL': 'oblique',
 'OBJ': 'object',
 'N-': '"non- (e.g. NSG nonsingular, NPST nonpast)"',
 'F': 'feminine',
 'COMP': 'complementizer',
 'CLF': 'classifier',
 'PROH': 'prohibitive',
 'ABL': 'ablative',
 'N': 'neuter',
 'LOC': 'locative',
 'IMP': 'imperative',
 'DECL': 'declarative',
 'ABS': 'absolutive',
 'REFL': 'reflexive',
 'DISTR': 'distributive',
 'SBJ': 'subject',
 'AUX': 'auxiliary',
 'DUR': 'durative',
 'ADJ': 'adjective',
 'POSS': 'possessive',
 'NOM': 'nominative',
 'ALL': 'allative',
 'PURP': 'purposive',
 'ART': 'article',
 'P': 'patient-like argument of canonical transitive verb',
 'PRED': 'predicative',
 'PROX': 'proximal/proximate',
 'S': 'single argument of canonical intransitive verb',
 'COMPL': 'completive',
 'INS': 'instrumental',
 'INDF': 'indefinite',
 'M': 'masculine',
 'NMLZ': 'nominalizer/nominalization',
 'PFV': 'perfective',
 'IND': 'indicative',
 'INF': 'infinitive',
 'SG': 'singular',
 'PST': 'past',
 'INTERM': 'intermediate',
 'CLIT': 'clitic',
 'NCOMPL': 'noncompletive',
 'IMPF': 'imperfect',
 'TMA': 'tense-mood-aspect',
 'NPROX': 'nonproximal',
 'PROP': 'proprietive',
 'NONFUT': 'nonfuture',
 'PP': 'past participle',
 'NSG': 'nonsingular',
 'CS': 'Bedeutung',
 'OR': 'reduplication',
 'RED': 'reduplication',
 'ABIL': '"ability (verb), abilitive mood"',
 'ACCID': 'accidental',
 'ACT': 'action marker',
 'ADD': 'additive',
 'ADJZ': 'adjectivalizer',
 'ADMON': 'admonitive',
 'ADNOM': 'adnominal',
 'ADR': 'addressive',
 'ADVERS': 'adversative',
 'ADVZ': 'adverbializer',
 'AFF': 'affirmative',
 'AG': '"agent, agentive"',
 'ANAPH': 'anaphoric',
 'ANIM': 'animate',
 'ANT': 'anterior',
 'ASP': 'aspect [marker|particle]',
 'ASS': 'associative (plural)',
 'ASSOBL': 'associative (obligation)',
 'ASSOC': 'associative [preposition]',
 'ATT': 'attenuative',
 'ATTR': 'attributive marker',
 'BE': '"identity-copula, locative-existential copula"',
 'BGND': 'background',
 'BIND': 'binding element',
 'CIS': '"cislocative, movement towards speaker"',
 'CL': 'class',
 'CNS': 'consuetudinal',
 'CNTRFAC': 'counterfactual',
 'COLL': 'associative plural',
 'COMPAR': 'comparative affix/marker',
 'CONC': 'concessive',
 'CONF': 'confirmation particle',
 'CONJ': 'conjunction',
 'CONN': 'connective',
 'CONSEC': 'consecutive',
 'CONT': '"continuative, continuous, ongoing action"',
 'CONTR': 'contrastive',
 'CPD': 'compound component derived by tone deletion',
 'CPLT': 'contemplated aspect',
 'CSEC': 'consecutive',
 'CTPL': 'contemplated aspect',
 'DEFRCLT': 'deferential clitic',
 'DEG': 'degree [particle|word]',
 'DEIC': 'deictic',
 'DELIM': 'delimitative',
 'DEP': 'dependent (pronoun)',
 'DEPV': 'dependent verb',
 'DESID': 'desiderative',
 'DETRANS': 'detransitivizing',
 'DIM': 'diminutive',
 'DIR': '"direction, directional"',
 'DISASS': 'disassociative',
 'DISC': 'discourse marker',
 'PCL': 'discourse particle',
 'DO': 'direct object',
 'DOUBT': 'doubt',
 'DS': 'different subject',
 'DUB': '"dubitative, uncertain knowledge"',
 'DUMMY': 'dummy pronoun',
 'EMPH': '"emphatic, emphasis"',
 'ENCL': 'enclitic',
 'EPIST': 'epistemic',
 'EQ': 'equational copula',
 'EVID': 'evidential',
 'EXCLAM': 'exclamation',
 'EXIST': 'existential',
 'EXPL': 'expletive',
 'FAM': 'familiar',
 'FILL': 'filler (item)',
 'FIN': 'finite',
 'FPST': 'far past',
 'FORMAL': 'formal',
 'FV': 'final vowel (default final vowel of verb form)',
 'GENER': 'generic',
 'GER': 'gerund',
 'H': 'high toneme',
 'HAB': 'habitual',
 'HABIL': 'habilitative',
 'HL': 'highlighter',
 'HON': 'honorific',
 'HORT': 'hortative',
 'HUM': 'human',
 'IDENTITY': 'identity copula',
 'IDEO': 'ideophone',
 'IGN': 'ignorative',
 'IMM': 'immediate past',
 'IMPRS': 'impersonal pronoun',
 'INAB': 'inability',
 'INACC': 'inaccompli',
 'INAN': 'inanimate',
 'INC': 'inchoative',
 'INCEP': 'inceptive [future]',
 'INCOMPL': 'incompletive',
 'INCORP': 'incorporated (noun)',
 'INDP': 'independent',
 'INFL': 'inflectional marker',
 'INGR': 'ingressive',
 'INSIST': 'insistence',
 'INT': '"intentional, intentionalis"',
 'INTENS': '"intensifier, intensitive, intensive, intensity"',
 'INTERJ': 'interjection',
 'INTFR': 'intensifier',
 'INTIM': 'intimate',
 'INV': 'inverse marker',
 'IO': 'indirect object',
 'ITER': 'iterative',
 'JUDG': 'judgment',
 'LINK': '"link vowel, link consonant"',
 'LK': 'linker',
 'LOCV': 'locative verb',
 'LOG': 'logophoric personal pronoun',
 'MIR': 'mirative',
 'MKD': 'marked',
 'MOD': '"modal auxiliary/verb/particle, modality"',
 'MODIF': 'modifier',
 'MOOD': 'mood particle',
 'NACCOMPL': 'non accomplished',
 'NARR': 'narrative',
 'NECESS': 'necessity',
 'NFIN': 'non-finite',
 'NFUT': 'non-future',
 'NHON': 'non-honorific',
 'NP': 'noun phrase',
 'NPST': 'nonpast',
 'NSBJ': 'non-subject',
 'NUM': '"number, numeral"',
 'OBLIG': '"obligative [mood marker], obligatory"',
 'OBV': 'obviative',
 'OPT': 'optative',
 'ORD': 'ordinal',
 'PAUC': 'paucal',
 'PERM': '"permission, permissive"',
 'PERMANENT': 'permanent state',
 'PM': 'predicate marker',
 'POL': 'polite',
 'POSTP': 'postposition',
 'POT': 'potential',
 'PREP': 'preposition',
 'PRESV': '"presentational, presentative"',
 'PRET': 'preterite',
 'PRO': '"pronoun, resumptive pronoun"',
 'PSREFL': 'pseudo-reflexive pronoun',
 'TAG': 'question tag',
 'QUANT': '"quantifier, quantitative"',
 'REM': 'remote',
 'REP': '"repetition, repetitive"',
 'REPORT': 'reportative',
 'REQ': 'requestative',
 'SENT': 'sentence particle',
 'SEQ': 'sequence marker',
 'SI': 'subject index',
 'SIML': 'similative',
 'SM': 'subject marker',
 'SP': 'post-verbal demonstrative-like marker',
 'SPECUL': 'speculative',
 'SS': 'same subject',
 'STANDARD': 'comparative standard',
 'MARKER': 'comparative standard',
 'STAT': 'stative',
 'SUBORD': 'subordinator',
 'SUPERL': 'superlative',
 'SUPPL': 'suppletive',
 'SVC': 'serial verb construction',
 'TA': 'transitive animate',
 'TAM': '"tense aspect mood, tense-mood-aspect"',
 'TEMP': 'temporal',
 'TNS': 'tense particle',
 'V': 'verb',
 'PREF': 'verbal prefix',
 'VAL': 'validator',
 'VBLZ': 'verbalizer',
 'VOL': 'volitive',
 'VPCL': 'verb particle'}

Matching with MMoOn

In [37]:
mmoon = rdflib.Graph()
mmoon.parse('mmoon-core.ttl', format='n3')
Out[37]:
<Graph identifier=Na97802aad3ae48a6a91760db37d46c7e (<class 'rdflib.graph.Graph'>)>
In [38]:
MMOON = rdflib.Namespace("http://mmoon.org/core/")
In [39]:
list(mmoon.subject_predicates(rdflib.term.Literal('N-', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string'))))
Out[39]:
[(rdflib.term.URIRef('http://mmoon.org/core/MorphemicGloss_N-'),
  rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'))]
In [41]:
matches_mmoon = {}
for gloss, val in gloss_abbr.items():
    items = list(mmoon.subject_predicates(rdflib.term.Literal(gloss, datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string'))))
    if len(items) > 0:
        matches_mmoon[gloss] = list(mmoon.subjects(URIRef('http://mmoon.org/core/hasAbstractIdentity'), items[0][0]))[0]
In [42]:
matches_mmoon
Out[42]:
{'ACC': rdflib.term.URIRef('http://mmoon.org/core/Accusative'),
 'ADV': rdflib.term.URIRef('http://mmoon.org/core/Adverb'),
 'FUT': rdflib.term.URIRef('http://mmoon.org/core/Future'),
 'BEN': rdflib.term.URIRef('http://mmoon.org/core/Benefactive'),
 'COP': rdflib.term.URIRef('http://mmoon.org/core/Copula'),
 'NEG': rdflib.term.URIRef('http://mmoon.org/core/Negation'),
 'DET': rdflib.term.URIRef('http://mmoon.org/core/Determiner'),
 'QUOT': rdflib.term.URIRef('http://mmoon.org/core/Quotative'),
 'VOC': rdflib.term.URIRef('http://mmoon.org/core/Vocative'),
 'ERG': rdflib.term.URIRef('http://mmoon.org/core/Ergative'),
 'COND': rdflib.term.URIRef('http://mmoon.org/core/Conditional'),
 'PASS': rdflib.term.URIRef('http://mmoon.org/core/Passive'),
 'DU': rdflib.term.URIRef('http://mmoon.org/core/Dual'),
 'COM': rdflib.term.URIRef('http://mmoon.org/core/Comitative'),
 'DEF': rdflib.term.URIRef('http://mmoon.org/core/Definite'),
 'EXCL': rdflib.term.URIRef('http://mmoon.org/core/Exclusive'),
 'DAT': rdflib.term.URIRef('http://mmoon.org/core/Dative'),
 'PRF': rdflib.term.URIRef('http://mmoon.org/core/Perfect'),
 'IRR': rdflib.term.URIRef('http://mmoon.org/core/Irrealis'),
 'ANTIP': rdflib.term.URIRef('http://mmoon.org/core/Antipassive'),
 'IPFV': rdflib.term.URIRef('http://mmoon.org/core/ImperfectiveAspect'),
 'PROG': rdflib.term.URIRef('http://mmoon.org/core/ProgressiveAspect'),
 'CAUS': rdflib.term.URIRef('http://mmoon.org/core/CausativeVoice'),
 'APPL': rdflib.term.URIRef('http://mmoon.org/core/Applicative'),
 'PRS': rdflib.term.URIRef('http://mmoon.org/core/Present'),
 'RES': rdflib.term.URIRef('http://mmoon.org/core/ResultativeAktionsart'),
 'INCL': rdflib.term.URIRef('http://mmoon.org/core/Inclusive'),
 'GEN': rdflib.term.URIRef('http://mmoon.org/core/Genitive'),
 'PL': rdflib.term.URIRef('http://mmoon.org/core/Plural'),
 'OBL': rdflib.term.URIRef('http://mmoon.org/core/Oblique'),
 'N-': rdflib.term.URIRef('http://mmoon.org/core/DerivativeNegation'),
 'F': rdflib.term.URIRef('http://mmoon.org/core/Feminine'),
 'COMP': rdflib.term.URIRef('http://mmoon.org/core/Complementizer'),
 'PROH': rdflib.term.URIRef('http://mmoon.org/core/Prohibitive'),
 'ABL': rdflib.term.URIRef('http://mmoon.org/core/Ablative'),
 'N': rdflib.term.URIRef('http://mmoon.org/core/Neuter'),
 'LOC': rdflib.term.URIRef('http://mmoon.org/core/Locative'),
 'IMP': rdflib.term.URIRef('http://mmoon.org/core/Imperative'),
 'DECL': rdflib.term.URIRef('http://mmoon.org/core/DeclarativeMood'),
 'ABS': rdflib.term.URIRef('http://mmoon.org/core/Absolutive'),
 'REFL': rdflib.term.URIRef('http://mmoon.org/core/ReflexiveVoice'),
 'DISTR': rdflib.term.URIRef('http://mmoon.org/core/DistributiveAktionsart'),
 'AUX': rdflib.term.URIRef('http://mmoon.org/core/AuxiliaryVerb'),
 'DUR': rdflib.term.URIRef('http://mmoon.org/core/DurativeAktionsart'),
 'ADJ': rdflib.term.URIRef('http://mmoon.org/core/Adjective'),
 'NOM': rdflib.term.URIRef('http://mmoon.org/core/Nominative'),
 'ALL': rdflib.term.URIRef('http://mmoon.org/core/Allative'),
 'ART': rdflib.term.URIRef('http://mmoon.org/core/Article'),
 'COMPL': rdflib.term.URIRef('http://mmoon.org/core/CompletiveAspect'),
 'INS': rdflib.term.URIRef('http://mmoon.org/core/Instrumental'),
 'M': rdflib.term.URIRef('http://mmoon.org/core/Masculine'),
 'NMLZ': rdflib.term.URIRef('http://mmoon.org/core/Nominalization'),
 'PFV': rdflib.term.URIRef('http://mmoon.org/core/PerfectiveAspect'),
 'IND': rdflib.term.URIRef('http://mmoon.org/core/Indicative'),
 'INF': rdflib.term.URIRef('http://mmoon.org/core/Infinitve'),
 'SG': rdflib.term.URIRef('http://mmoon.org/core/Singular'),
 'PST': rdflib.term.URIRef('http://mmoon.org/core/Past'),
 'INTERM': rdflib.term.URIRef('http://mmoon.org/core/Interminative'),
 'IMPF': rdflib.term.URIRef('http://mmoon.org/core/Imperfect'),
 'ACT': rdflib.term.URIRef('http://mmoon.org/core/Active'),
 'ADJZ': rdflib.term.URIRef('http://mmoon.org/core/Adjectivization'),
 'ADVZ': rdflib.term.URIRef('http://mmoon.org/core/Adverbialization'),
 'ANT': rdflib.term.URIRef('http://mmoon.org/core/Anterior'),
 'CONJ': rdflib.term.URIRef('http://mmoon.org/core/Conjunction'),
 'CONT': rdflib.term.URIRef('http://mmoon.org/core/ContinuousAspect'),
 'DIM': rdflib.term.URIRef('http://mmoon.org/core/Diminution'),
 'DS': rdflib.term.URIRef('http://mmoon.org/core/DifferentSubject'),
 'DUB': rdflib.term.URIRef('http://mmoon.org/core/Dubitative'),
 'FPST': rdflib.term.URIRef('http://mmoon.org/core/FutureInPast'),
 'GER': rdflib.term.URIRef('http://mmoon.org/core/Gerund'),
 'HON': rdflib.term.URIRef('http://mmoon.org/core/Honorificity'),
 'HORT': rdflib.term.URIRef('http://mmoon.org/core/Hortative'),
 'HUM': rdflib.term.URIRef('http://mmoon.org/core/Human'),
 'INAN': rdflib.term.URIRef('http://mmoon.org/core/Inanimate'),
 'INGR': rdflib.term.URIRef('http://mmoon.org/core/Ingressive'),
 'INT': rdflib.term.URIRef('http://mmoon.org/core/Interrogative'),
 'ITER': rdflib.term.URIRef('http://mmoon.org/core/IterativeAktionsart'),
 'MIR': rdflib.term.URIRef('http://mmoon.org/core/Mirative'),
 'MOD': rdflib.term.URIRef('http://mmoon.org/core/MannerCase'),
 'NARR': rdflib.term.URIRef('http://mmoon.org/core/Narrative'),
 'NFUT': rdflib.term.URIRef('http://mmoon.org/core/NonFuture'),
 'NPST': rdflib.term.URIRef('http://mmoon.org/core/NonPast'),
 'NUM': rdflib.term.URIRef('http://mmoon.org/core/Numeral'),
 'OPT': rdflib.term.URIRef('http://mmoon.org/core/Optative'),
 'ORD': rdflib.term.URIRef('http://mmoon.org/core/OrdinalNumber'),
 'POT': rdflib.term.URIRef('http://mmoon.org/core/Potential'),
 'REP': rdflib.term.URIRef('http://mmoon.org/core/Repetitive'),
 'SS': rdflib.term.URIRef('http://mmoon.org/core/SameSubject'),
 'STAT': rdflib.term.URIRef('http://mmoon.org/core/Stative'),
 'TEMP': rdflib.term.URIRef('http://mmoon.org/core/TemporalisCase'),
 'V': rdflib.term.URIRef('http://mmoon.org/core/Verb')}

Matching with OLiA

In [43]:
unimorph = rdflib.Graph()
unimorph.parse('olia-unimorph.ttl', format='n3')
Out[43]:
<Graph identifier=N5da481b3f493495299f973e7c93086f1 (<class 'rdflib.graph.Graph'>)>
In [44]:
list(unimorph.subject_predicates(rdflib.term.Literal('NOM')))
Out[44]:
[(rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#NOM'),
  rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#hasLabel'))]
In [45]:
matches_olia = {}
matches_olia_add = {}
for gloss, val in gloss_abbr.items():
    items = list(unimorph.subject_predicates(rdflib.term.Literal(gloss)))
    if len(items) > 0:
        matches_olia[gloss] = items[0][0]
    else:
        items = list(unimorph.subject_predicates(rdflib.term.Literal(val)))
    if len(items) > 0:
        matches_olia_add[gloss] = items[0][0]
In [46]:
matches_olia_add
Out[46]:
{'ACC': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#ACC'),
 'ADV': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#ADV'),
 'FUT': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#FUT'),
 'BEN': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#BEN'),
 'REL': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#REL'),
 'NEG': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#NEG'),
 'TOP': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#TOP'),
 'FOC': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#FOC'),
 'QUOT': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#QUOT'),
 'VOC': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#VOC'),
 'ERG': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#ERG'),
 'COND': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#COND'),
 'INTR': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#INTR'),
 'PTCP': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#V_PTCP'),
 'DIST': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#REM'),
 'PASS': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#PASS'),
 'COM': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#COM'),
 'EXCL': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#EXCL'),
 'DAT': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#DAT'),
 'PRF': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#PRF'),
 'ANTIP': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#ANTIP'),
 'IPFV': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#IPFV'),
 'CVB': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#V_CVB'),
 'TR': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#TR'),
 'PROG': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#PROG'),
 'CAUS': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#CAUS'),
 'RECP': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#RECP'),
 'APPL': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#APPL'),
 'PRS': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#PRS'),
 'GEN': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#GEN'),
 'PL': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#ARGP'),
 'F': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#FEM'),
 'COMP': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#COMP'),
 'CLF': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#CLF'),
 'ABL': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#ABL'),
 'N': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#N'),
 'DECL': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#DECL'),
 'ABS': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#ABS'),
 'REFL': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#REFL'),
 'AUX': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#AUX'),
 'DUR': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#DUR'),
 'ADJ': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#ADJ'),
 'NOM': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#NOM'),
 'ALL': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#ALL'),
 'PURP': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#PURP'),
 'PROX': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#PROX'),
 'INS': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#INS'),
 'M': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#MASC'),
 'PFV': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#PFV'),
 'SG': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#SG'),
 'PST': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#PST'),
 'OR': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#OR'),
 'ACT': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#ACT'),
 'AFF': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#POS'),
 'ANIM': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#ANIM'),
 'DESID': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#OPT'),
 'DIR': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#DIR'),
 'DS': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#DS'),
 'FIN': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#FIN'),
 'FORMAL': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#FORM'),
 'GER': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#V_CVB'),
 'IMPRS': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#IMPRS'),
 'INAN': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#INAN'),
 'INT': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#INT'),
 'INTERJ': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#INTJ'),
 'INV': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#INV'),
 'ITER': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#ITER'),
 'LOG': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#LOG'),
 'NFIN': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#NFIN'),
 'OBLIG': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#OBLIG'),
 'OBV': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#OBV'),
 'OPT': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#OPT'),
 'PAUC': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#PAUC'),
 'PERM': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#PERM'),
 'POL': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#POL'),
 'POT': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#POS'),
 'PRO': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#PRO'),
 'REM': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#REM'),
 'SS': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#SS'),
 'STAT': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#STAT'),
 'SUPERL': rdflib.term.URIRef('http://purl.org/olia/unimorph.owl#SPRL')}

Number of unique matches in total:

In [49]:
len(set(matches_olia.keys()) | set(matches_mmoon.keys()) | set(matches_olia_add.keys()))
Out[49]:
123

Number of matches with OLiA-Unimorph:

In [173]:
len(set(matches_olia.keys()) | set(matches_olia_add.keys()))
Out[173]:
81

Number of matches with MMoOn:

In [50]:
len(matches_mmoon)
Out[50]:
91

Adding mappings

In [51]:
hasValue = apics + URIRef('hasValue')
hasValue
Out[51]:
rdflib.term.URIRef('http://purl.org/liodi/ligt/apics/hasValue')
In [52]:
for label, uri in matches_mmoon.items():
    g.add((uri, hasValue, Literal(label, lang="en")))

for label, uri in matches_olia.items():
    g.add((uri, hasValue, Literal(label, lang="en")))

for label, uri in matches_olia_add.items():
    g.add((uri, hasValue, Literal(label, lang="en")))
In [53]:
g.serialize(format='turtle', destination='../apics_ligt-mapped.ttl', encoding='utf-8')

Number of still unmatched abbreviations:

In [55]:
len(set(gloss_abbr.keys()) - set(matches_olia.keys()) - set(matches_olia_add.keys()) - set(matches_mmoon.keys()))
Out[55]:
144