Twitter Archive: Library of Congress to Prioritize Historic Tweets
Background
In 2010, the Library of Congress (LOC) entered into an agreement with Twitter to archive every public tweet ever sent. This ambitious project aimed to preserve a vast repository of social commentary and communication for future generations.
Challenges of Archiving Twitter Data
Over the years, several challenges have emerged in managing the Twitter archive.
-
Volume and Size of Tweets: The sheer volume of tweets has grown exponentially, making it increasingly difficult to store and process the data. In 2010, Twitter users sent approximately 50 million tweets per day. Today, that number has skyrocketed to over 500 million tweets per day.
-
Changing Nature of Tweets: Tweets have evolved significantly since their inception. Initially, tweets were limited to 140 characters and primarily consisted of text. However, the character limit has since been expanded to 280, and tweets now often include images, videos, and animated GIFs. The LOC only archives the text of tweets, which means that a significant amount of context is lost.
-
Limited Resources: The LOC did not have the necessary resources or expertise to effectively manage the Twitter archive. The library lacked full-time engineers to process the massive influx of tweets, and this resulted in significant delays and challenges in making the archive accessible to researchers.
Decision to Prioritize Historic Tweets
In light of these challenges, the LOC has decided to prioritize the archiving of tweets that are deemed to be of historic importance. This decision was announced in a white paper published by the LOC, which outlined the reasons for the change in policy.
The LOC acknowledges that Twitter is a constantly evolving platform, and it is impossible to predict how it will continue to change in the future. The library’s goal is to collect and preserve a representative sample of tweets that can provide insights into the social, political, and cultural landscape of our time.
Current Status of the Archive
The current 12-year archive of tweets is not publicly accessible, and the LOC has not yet announced a timeline for when it might become available. The library intends for the archive to serve as a snapshot of the early years of social media communication, similar to the way the LOC preserves telegraphs from the early days of that technology.
Potential Value of the Archive
Researchers in various fields, including sociology, psychology, political science, and communication, have expressed interest in accessing the Twitter archive. They believe that the data could provide valuable insights into human behavior, social trends, and political discourse.
Conclusion
The LOC’s decision to prioritize the archiving of historic tweets is a recognition of the challenges involved in managing and preserving social media data. While the full archive is not yet publicly accessible, it is hoped that the LOC will eventually find a way to make it available to researchers and the public, allowing us to gain a deeper understanding of our digital past and present.