HyperText Corpus Initiative : how to help researchers sieving the web?

Paul Girard

Communication Dans Un Congrès Année : 2011

HyperText Corpus Initiative : how to help researchers sieving the web?

(1)

Paul Girard

Fonction : Auteur
PersonId : 754677
IdHAL : paul-girard
ORCID : 0000-0001-9332-3308

médialab (Sciences Po)

Résumé

Since its foundation in May 2009, the médialab Sciences Po works to foster the use of digital methods and tools in social sciences. With the help of existing tools and methods, we experienced the use of web mining techniques to extract data on collective phenomena. We also attended the symposiums organised by the two institutions responsible of web archiving in France: BnF and INA where we learnt about the difficulties posed to social scientists by the use of web archives. Actually our own experience in mining the live web wasn't easier. Such difficulties, we believe, can be explained by the lack of tools allowing scholars to build themselves the highly specialized corpora they need from the wide heterogeneity of the web. The web isn't a well-known document space for scholars or librarians. Its hyperlinked and heterogeneous nature requires to envision new ways of conceiving and building web corpora. And this notion of web corpus is a necessity for both live and archived web. If methods are not appropriate enough for analysing the live web, the problem will not be easier on an archive where the time dimension adds complexity.

Domaines

Sciences de l'information et de la communication

Fichier principal

girard-hci.pdf (49.22 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Spire Sciences Po Institutional Repository : Connectez-vous pour contacter le contributeur

https://sciencespo.hal.science/hal-01064259

Soumis le : lundi 15 septembre 2014-23:30:15

Dernière modification le : jeudi 29 juin 2023-16:32:04

Archivage à long terme le : mardi 16 décembre 2014-11:50:39

Dates et versions

hal-01064259 , version 1 (15-09-2014)

Identifiants

HAL Id : hal-01064259 , version 1
SCIENCESPO : 2441/5coittpe7h8g695h172cg34d3e

Citer

Paul Girard. HyperText Corpus Initiative : how to help researchers sieving the web?. Out of the Box conference : Using Web Archives, May 2011, Velika dvorana, Slovenia. ⟨hal-01064259⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SCIENCESPO MEDIALAB SCPO_OA

121 Consultations

229 Téléchargements

HyperText Corpus Initiative : how to help researchers sieving the web?

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager