עברית | English

The Corpus of Spoken Israeli Hebrew (CoSIH)

 

Text sigla

Sigla for recordings of the preparatory phase have two components: the initial letter of the volunteer's (pseudonymous) first name, followed by another identifying symbol. Sigla for recordings of the pilot study have two components as well: the initial letter of the institute that recruited the volunteers (C, D, P)7, after which comes the number of the tape and disc from which the recording was sampled (if more than one part were sampled from the same disc, their respective consecutive numbers are shown after an underscore).

Downloadable CoSIH Texts

The selection of CoSIH texts currently presented includes sample recordings of 37 volunteers, three of which are from the preparatory phase and 34 are from the pilot study. The total number of speakers, including interlouctors, is ca. 140. The total length of the recordings presented as ELAN transcriptions aligned with their audio is ca. five hours and 15 minutes, not including an additional five and a half minute text transcribed and presented only in PDF format. These are supplemented by CoSIH samples that formed part of a research corpus compiled by Nurit Dekel for her doctoral dissertation (Dekel 2010). The transcriptions of these texts are presented in PDF format. Their total length is just over five hours.8

We furthermore offer the research community samples from yet non-transcribed recordings, and invite our colleagues to send us their transcriptions, either in standard orthography or phonetic, thus helping to enhance CoSIH.9 The total length of these texts is around two hours and 45 minutes.

Overall, the research community is now presented with ca. 13½ hours of recorded texts. We hope that we will be able to enlarge this selection in the future – both in recordings and in standard orthography or phonetic transcriptions.

Sociolinguistic data

The data regarding each volunteer, as they were given to CoSIH's representatives, are summarized in Table 2 (in Hebrew). Clicking the links in the "Questionnaire" column will display/download the sociolinguistic questionnaire that was filled in accordance with the answers given by the volunteer to a representative of CoSIH.

Downloads

Table 3 (also in Hebrew) displays some details regarding the recorders' interlocutors and recordings, as well as provides links to downloadable audio files in WAV, MP3 and ELAN formats as well as PDF documents.

Use of CoSIH material

Use of the recordings and transcriptions (standard orthography and phonetic transcriptions) is limited to non-commercial use. Whenever CoSIH material is used, its source and copyright must be specified as follows:

References

Copyright


7 C = The B. I. and Lucille Cohen Institute for Public Opinion Research at Tel Aviv University, which recruited 16 volunteers; D = Dahaf Institute, which recruited 10 volunteers; P = PORI Institute, which recruited 16 volunteers.

8 We wish to thank Nurit Dekel for agreeing to present her transcripts as part of CoSIH's website for the benefit of the research community. The transcripts were prepared by Nurit in the years 2005-2007. For additional sigla, cf. יזרעאל תשס"ב(א): 291-290.

9 Texts from Nurit's corpus that overlap CoSIH transcripts in ELAN format were not uploaded. For partial overlaps, cf. their respective cells in Table 3.

10 The transcriber's name and copyright will of course be displayed for each contribution, as we have gladly done with the contributions by Nurit Dekel, Il-il Yatziv-Malibert, Elissa Guterman and Noam Faust.