Version at: 21/09/2022 03:30
# Using the Tatoeba Corpus for Your Own Projects
## Terms of Use
Read the [Terms of Use](http://tatoeba.org/eng/terms_of_use).
Note that the terms of use for the **audio files** are not the same as for the text of sentences.
See the [list of audio lists](https://tatoeba.org/eng/sentences_lists/of_user/CK/audio%20-/page:1/sort:modified/direction:desc) to see the license, if any, under which these people have offered their files for use outside of tatoeba.org. You should verify these licenses by clicking "audio files" on each member's profile.
## Processing the Tatoeba Corpus
You will probably want to filter out sentences that:
* require correction or improvement
* sound unnatural
* are poor or unnatural translations of other sentences
You may also may want to filter out those that:
* contain vulgar language or sexual references
* contain archaic or old-fashioned content
* are untrue
* would be inappropriate for your audience
* are particularly long
You can use various forms of metadata to aid with this process:
* tags (for instance, "@change", "archaic", "vulgar"; see [Tags](http://tatoeba.org/eng/tags/view_all) for more)
* sentence ratings
* contributors' self-reported skill in the language (as indicated in their profiles). Note that several members rate themselves as native speakers of multiple languages and that self-reported levels may not be accurate.
If you are using the data to create language learning materials:
* You should probably use only sentences that you or someone else has personally proofread and not rejected, since you do not want to be teaching people errors.
Note that most sentences that do not have errors are not explicitly marked with an "OK" rating or tag, and some sentences that do have errors are not marked with a negative rating or tag. Taking all of this into account, you will probably need to perform both custom automated processing and manual review.
## Suggestions for Those Planning to Use the Corpus
* Tell your audience how you selected the sentences.
* See an example on this page:[www.manythings.org/bilingual](http://www.manythings.org/bilingual/)
* Since corrections are being made all the time, you should frequently update your project so your audience benefits from these corrections.
## Download the Tatoeba Corpus
[Downloads](http://tatoeba.org/eng/download_tatoeba_example_sentences) are updated every Saturday.
## FAQ
* [How do I give proper attribution?](https://en.wiki.tatoeba.org/articles/show/faq#i-would-like-to-use-tatoeba's-data-for-my-project.)
* [Where can I download Tatoeba's audio data?](https://en.wiki.tatoeba.org/articles/show/faq#where-can-i-download-tatoeba's-audio-data?)
* [How can I download all sentences and translations in specific languages?](https://en.wiki.tatoeba.org/articles/show/faq#how-can-i-download-all-sentences-and-translations-)
version at: 21/09/2022 03:32
# Using the Tatoeba Corpus for Your Own Projects
## Terms of Use
Read the [Terms of Use](http://tatoeba.org/eng/terms_of_use).
Note that the terms of use for the **audio files** are not the same as for the text of sentences.
See the [list of audio lists](https://tatoeba.org/eng/sentences_lists/of_user/CK/audio%20-/page:1/sort:modified/direction:desc) to see the license, if any, under which these people have offered their files for use outside of tatoeba.org. You should verify these licenses by clicking "audio files" on each member's profile.
## Processing the Tatoeba Corpus
You will probably want to filter out sentences that:
* require correction or improvement
* sound unnatural
* are poor or unnatural translations of other sentences
You may also may want to filter out those that:
* contain vulgar language or sexual references
* contain archaic or old-fashioned content
* are untrue
* are sexist, are racist, are insulting to others, or otherwise inappropriate for your audience
* are particularly long
You can use various forms of metadata to aid with this process:
* tags (for instance, "@change", "archaic", "vulgar"; see [Tags](http://tatoeba.org/eng/tags/view_all) for more)
* sentence ratings
* contributors' self-reported skill in the language (as indicated in their profiles). Note that several members rate themselves as native speakers of multiple languages and that self-reported levels may not be accurate.
If you are using the data to create language learning materials:
* You should probably use only sentences that you or someone else has personally proofread and not rejected, since you do not want to be teaching people errors.
Note that most sentences that do not have errors are not explicitly marked with an "OK" rating or tag, and some sentences that do have errors are not marked with a negative rating or tag. Taking all of this into account, you will probably need to perform both custom automated processing and manual review.
## Suggestions for Those Planning to Use the Corpus
* Tell your audience how you selected the sentences.
* See an example on this page:[www.manythings.org/bilingual](http://www.manythings.org/bilingual/)
* Since corrections are being made all the time, you should frequently update your project so your audience benefits from these corrections.
## Download the Tatoeba Corpus
[Downloads](http://tatoeba.org/eng/download_tatoeba_example_sentences) are updated every Saturday.
## FAQ
* [How do I give proper attribution?](https://en.wiki.tatoeba.org/articles/show/faq#i-would-like-to-use-tatoeba's-data-for-my-project.)
* [Where can I download Tatoeba's audio data?](https://en.wiki.tatoeba.org/articles/show/faq#where-can-i-download-tatoeba's-audio-data?)
* [How can I download all sentences and translations in specific languages?](https://en.wiki.tatoeba.org/articles/show/faq#how-can-i-download-all-sentences-and-translations-)
Note
Les lignes en vert sont les lignes qui ont été ajoutées dans la nouvelle version.
Celles en rouge sont celles qui ont été supprimées.
Actions