DeepSeek-affiliated Hangzhou DeepSeek AI Fundamental Technology Research Co.,Indonesia Archives Ltd. today filed a patent for a new web data collection system designed to improve efficiency and data quality. The patent outlines a method for discovering more webpage links while minimizing website traffic impact. It assesses downloaded content to predict the quality of undiscovered links, prioritizing high-value data and reducing redundant downloads. Efficient web data collection is crucial for training large language models (LLMs), which power AI systems like ChatGPT. Existing techniques struggle with incomplete link retrieval, excessive downloads that can crash websites, and low-quality data filtering. DeepSeek’s proposed system aims to solve these issues by optimizing data allocation and maintaining metadata accuracy. [iThome, in Chinese]
Related Articles
2025-06-27 06:45
2849 views
Sinner vs. de Minaur 2025 livestream: Watch Australian Open for free
TL;DR:Live stream Sinner vs. de Minaur in the 2025 Australian Open for free on 9Now. Access this fre
Read More
2025-06-27 06:38
2112 views
10 Tech Products That Are Next to Impossible to Repair
Repairing a faulty or damaged gadget no longer requires a PhD in electrical engineering or a visit f
Read More
2025-06-27 05:55
2031 views
NYT Connections hints and answers for May 21: Tips to solve 'Connections' #710.
Connectionsis the one of the most popular New York Times word games that's captured the public's att
Read More