Skip to Main content Skip to Navigation
New interface
Conference papers

HyperText Corpus Initiative : how to help researchers sieving the web?

Abstract : Since its foundation in May 2009, the médialab Sciences Po works to foster the use of digital methods and tools in social sciences. With the help of existing tools and methods, we experienced the use of web mining techniques to extract data on collective phenomena. We also attended the symposiums organised by the two institutions responsible of web archiving in France: BnF and INA where we learnt about the difficulties posed to social scientists by the use of web archives. Actually our own experience in mining the live web wasn't easier. Such difficulties, we believe, can be explained by the lack of tools allowing scholars to build themselves the highly specialized corpora they need from the wide heterogeneity of the web. The web isn't a well-known document space for scholars or librarians. Its hyperlinked and heterogeneous nature requires to envision new ways of conceiving and building web corpora. And this notion of web corpus is a necessity for both live and archived web. If methods are not appropriate enough for analysing the live web, the problem will not be easier on an archive where the time dimension adds complexity.
Document type :
Conference papers
Complete list of metadata
Contributor : Spire Sciences Po Institutional Repository Connect in order to contact the contributor
Submitted on : Monday, September 15, 2014 - 11:30:15 PM
Last modification on : Friday, May 20, 2022 - 3:56:02 PM
Long-term archiving on: : Tuesday, December 16, 2014 - 11:50:39 AM


Files produced by the author(s)




Paul Girard. HyperText Corpus Initiative : how to help researchers sieving the web?. Out of the Box conference : Using Web Archives, May 2011, Velika dvorana, Slovenia. ⟨hal-01064259⟩



Record views


Files downloads