Prepared by Vladimír
      Benko within the framework of a joint Project of
  
  Main design decisions
  
    - Slovak-Centric (languages spoken and/or taught in Slovakia
      and its neighbouring countries)
- Latin names denoting language and size
- Crawled by SpiderLing
        at (approximately) the same time
- Language-independent filtration by the same tools
- Language-dependent filtration by the same methodology
- PoS-tagged
        by open-source or free tools,
        native tagsets mapped to Araneum Universal Tagset
- Document-level deduplicated, duplicate and near-duplicate documents deleted
- Paragraph and/or sentence-level deduplicated, duplicate and near-duplicate segments marked
- Word sketches with compatible sketch grammars
- Accessible online via web interface
        (under NoSketch
        Engine) at unesco.uniba.sk or
 aranea.juls.savba.sk
        (no registration required in Guest mode)
- Also hosted (under KonText)
        at kontext.korpus.cz (free registration required), and
 (under Sketch Engine) at www.sketchengine.co.uk
	(paid access, 30-day free trial available)
Aranea Corpora available (March 2019)
  
  Credits
  If you use the Aranea corpora for research purposes, or need to mention them for any reason,
  please cite the following paper(s):
  
    -  Benko, Vladimír: Aranea: Yet Another Family of (Comparable) Web Corpora.
      In Petr Sojka, Aleš Horák, Ivan Kopeček and Karel Pala (Eds.):
      Text, Speech and Dialogue. 17th International Conference,
      TSD 2014, Brno, Czech Republic, September 8-12, 2014. Proceedings. 
      LNCS 8655.
      Springer International Publishing Switzerland, 2014. pp. 257-264.
      ISBN: 978-3-319-10815-5 (Print), 978-3-319-10816-2 (Online).
        
          
- Benko, Vladimír: Compatible Sketch Grammars for Comparable Corpora.
      In Andrea Abel, Chiara Vettori, Natascia Ralli
      (Eds.): Proceedings of the XVI EURALEX International Congress: The User In Focus. 15–19 July 2014.
      Bolzano/Bozen: Eurac Research, 2014. pp. 417-430. ISBN 978-88-88906-97-3.
        
          
  As well as the paper on the NoSketch Engine:
  
    - Rychlý, Pavel: Manatee/Bonito – A Modular Corpus Manager.
      In 1st Workshop on Recent Advances in Slavonic Natural Language Processing.
      Brno: Masaryk University, 2007, pp. 65-70. ISBN 978-80-210-4471-5.
        
          
Contact
  If you need the source corpus data,
  please send a message to 
vladimir.benko at 
uniba.sk