Accéder directement au contenu Accéder directement à la navigation
Communication dans un congrès

HyperText Corpus Initiative : how to help researchers sieving the web?

Abstract : Since its foundation in May 2009, the médialab Sciences Po works to foster the use of digital methods and tools in social sciences. With the help of existing tools and methods, we experienced the use of web mining techniques to extract data on collective phenomena. We also attended the symposiums organised by the two institutions responsible of web archiving in France: BnF and INA where we learnt about the difficulties posed to social scientists by the use of web archives. Actually our own experience in mining the live web wasn't easier. Such difficulties, we believe, can be explained by the lack of tools allowing scholars to build themselves the highly specialized corpora they need from the wide heterogeneity of the web. The web isn't a well-known document space for scholars or librarians. Its hyperlinked and heterogeneous nature requires to envision new ways of conceiving and building web corpora. And this notion of web corpus is a necessity for both live and archived web. If methods are not appropriate enough for analysing the live web, the problem will not be easier on an archive where the time dimension adds complexity.
Type de document :
Communication dans un congrès
Liste complète des métadonnées
Contributeur : Spire Sciences Po Institutional Repository <>
Soumis le : lundi 15 septembre 2014 - 23:30:15
Dernière modification le : mardi 28 janvier 2020 - 16:50:03
Archivage à long terme le : : mardi 16 décembre 2014 - 11:50:39


Fichiers produits par l'(les) auteur(s)




Paul Girard. HyperText Corpus Initiative : how to help researchers sieving the web?. Out of the Box conference : Using Web Archives, May 2011, Velika dvorana, Slovenia. ⟨hal-01064259⟩



Consultations de la notice


Téléchargements de fichiers