Data Products

Corpus refers to a collection of large amounts of text. The text (called corpus) in a database is usually organized and has a predetermined format and markup, specifically a computer-stored digital corpus.

A generalized corpus refers to a collection of text, sounds, images, and videos stored in a computer with specific formats and tags.

Parallel corpora refer to a collection of language texts with two corresponding meanings.

eCorpus Inc (eCorpus), established in August 2020, is a research and development company specializing in natural language processing technology; It is also the company with the most comprehensive parallel corpus resources and is known as the first parallel corpus supplier in China.

With the rise of data technology, traditional linguistic methods have been unable to empower artificial intelligence research. Corpus has become the basic material for modern linguistics, machine learning, natural language processing, machine translation, and artificial intelligence research. Using the latest data tagging and collation methods, we have collected a large amount of bilingual text from the translations of real translators, forming a variety of parallel corpora that can be used for various computerized research.


Support customized personalized data collection and labeling requirements

Millions of customers respond to various needs in a timely manner, supporting complex collection tasks and specialized data annotation

I want to customize

Copyright by ecorpus.cn eCorpus china