Crosslingual information retrieval ¶

Cross-Lingual Information Retrieval is the task of getting information in a different language than the original query. Our goal is to implement a lightweight system, unsupervised and supervised, to recognize the translation of a sentence in a large collection of documents in a different language. Testing different cross-lingual word embedding- and text-based features with wide-ranging parameter combinations, our best model, the MLPClassifier, achieved a Mean Average Precision of 0.8459 on our English-German test collection. Our lightweight system also demonstrates zero-shot performance in other languages, such as Italian and Polish. We compare our results to the SOTA, but resource-hungry transformer model XLM-R.

Table of Contents ¶

Contents

Crosslingual information retrieval

Description ¶

We make all our code availabe that were used for this project. It contains the data preprocessing, inducing cross-lingual word embeddings, training and evaluating all models. You can find the code for each part in the following table:

All Experiments done were written in Jupyter Notebooks, which can be found in this Folder

Furthermore, we make all models available Drive. All raw and preprocessed data can be downloaded in the following Drive.

Our results are summarized in the following table:

https://github.com/J4K08L4N63N84HN/crosslingual-information-retrieval/blob/main/reports/figures/final_results.png

How to Install ¶

To use this code you have to follow these steps:

Start by cloning this Git repository:

$  git clone https://github.com/J4K08L4N63N84HN/crosslingual-information-retrieval.git
$  cd crosslingual-information-retrieval

Continue by creating a new conda environment (Python 3.8):

$  conda create -n animate_logos python=3.8
$  conda activate animate_logos

Install the dependencies:

$ pip install -r requirements.txt

For a detailed documentation you can refere to here or create your own sphinx documentation with

Crosslingual information retrieval ¶

Table of Contents ¶

Description ¶

How to Install ¶

Credits ¶

License ¶

Indices and tables ¶

Crosslingual information retrieval¶

Table of Contents¶

Description¶

How to Install¶

Credits¶

License¶

Indices and tables¶

Crosslingual information retrieval ¶

Table of Contents ¶

Description ¶

How to Install ¶

Credits ¶

License ¶

Indices and tables ¶