A6 – Computational lexicography



Teachers:

The EMLex offers a diverse spectrum of teachers & lecturers from around the globe. This course will be held by:

Prof. Dr. Ulrich Heid

University of Hildesheim

Dr. Besim Kabashi

Friedrich Alexander University Erlangen-Nuremberg

 

Contents:

Topics to be treated in this module include:

  1. Foundations of corpus linguistics
    • principles and methods of corpus analysis
    • applications of corpus data in lexicography
    • types of corpora, overview of existing corpora
    • corpus design, representativity, data sources, metadata
  2. Corpus compilation
    • building corpora from online data: web scraping etc.
    • boilerplate removal, normalization, metadata extraction
    • representation and exchange formats
    • online and stand-alone tools for web corpus compilation
    • automatic linguistic annotation (POS, lemma, NER, parsing, …)
    • online and stand-alone tools for linguistic annotation
  3. Searching corpora
    • regular expressions
    • character encodings and the Unicode standard
    • CQP query language for lexico-grammatical patterns
    • practical exercises with Sketch Engine and CQP web
  4. Quantitative analysis
    • frequency lists and metadata distribution
    • collocations and word sketches
    • keyword analysis
    • lexicographic interpretation of results
    • foundations of statistical inference
  5. Reproducibility
    • research methodology and documentation
    • data management, sustainability of corpus resources

Please see the module description for further information.

 

General information:

Time frame 20.02.-24.02.
Room R6 III/12
Evaluation method participation in a team project with a written report (the teams will be determined at the beginning of the module)
Teaching language German and English

 

Information on the EMLex 2023 Summer school:

Practical arrangements: Participants will receive a syllabus, relevant literature and suggestions on how to prepare for the course well in advance on the moodle plattform. The sessions are moderated by the lecturer and the guest lecturer. The lessons are centered around practical exercises with the computer, to be carried out in small groups (instructions will be given beforehand).

 

Certificate: There are two alternatives to get an EMLex 2023 Summer school certificate – (a) without grade: active participation in practical exercises, class discussions and a team project and (b) with grade: participation in a team project with a written report; the teams will be determined at the beginning of the course.