This page is dedicated to some of the upcoming improvements as well as how the site's backend logic operates. The code won't be posted, but here we'll discuss how the data is processed. We start with gathering news articles, and that can be done in many different ways.
- We only gather information onces per day. It is going to be scheduled at 6 am and posted around 8 am. Unless there is an error with the load, then articles will be posted up around 10 am.
- Once we gather the articles, we have to process them. To summarize the article we measure the importance of each sentence by giving a weight to the sum of the term frequency. (TF-IDF). We do additional processing to summarize the articles, such as trying to figure out important key words
- With the key words, we then gather wiki data
- From here SQL code is run to join the tables we built to get a production table
- Backups are made in case of errors.
Ideas for improvements / transparency items
- We are going to gather information on article clicks. This is so that we can gather data on articles that people like, and sources that people like to try and improve searches and potentially later giving custom suggestions of news articles
- Suggesting articles for sharing