# Political leaning prediction

## Docs

The complete results of all our measurements are stored in the [results](results) directory.

## Analysis

The Jupyter notebooks are stored in the [analysis](analysis) directory.

## Used datasets

### Politicalness

- Hou, Y., Li, J., He, Z., Yan, A., Chen, X., & McAuley, J. (2024). Bridging language and items for retrieval and
  recommendation. arXiv preprint arXiv:2403.03952. <https://arxiv.org/abs/2403.03952>
- Chen, Y., Liu, Y., Chen, L., & Zhang, Y. (2021). DialogSum: A real-life scenario dialogue summarization dataset.
  Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 5062–5074. Association for Computational
  Linguistics. <https://doi.org/10.18653/v1/2021.findings-acl.449>
- Webhose.io. (n.d.). Free News Datasets [Dataset]. Retrieved from <https://github.com/Webhose/free-news-datasets>
- Szemraj, P. (2024). Goodreads book genres dataset. Retrieved
  from <https://huggingface.co/datasets/pszemraj/goodreads-bookgenres>
- Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment
  analysis. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language
  Technologies, 142–150. Association for Computational Linguistics. <http://www.aclweb.org/anthology/P11-1015>
- Heymans, A. (2022). IMDB movie genres. Retrieved
  from <https://huggingface.co/datasets/adrienheymans/imdb-movie-genres>
- nulldata. (2019). Medium post titles dataset. Kaggle. Retrieved
  from <https://www.kaggle.com/datasets/nulldata/medium-post-titles>. This dataset is used under the Creative Commons
  Attribution 3.0 Unported (CC BY 3.0) license.
- Misra, R. (2018). News Category Dataset. Kaggle. Retrieved
  from <https://www.kaggle.com/datasets/rmisra/news-category-dataset>. This dataset is used under the Creative Commons
  Attribution 4.0 International (CC BY 4.0) license.
- Patel, D. (2021). Microsoft PENS: Personalized News Headlines [Dataset].
  Kaggle. <https://www.kaggle.com/datasets/divyapatel4/microsoft-pens-personalized-news-headlines>. This dataset is used
  under the MIT license.
- Kawintiranon, K., & Singh, L. (2022). PoliBERTweet: A pre-trained language model for analyzing political content on
  Twitter. In Proceedings of the Language Resources and Evaluation Conference (LREC) (pp. 7360–7367). European Language
  Resources Association. <https://aclanthology.org/2022.lrec-1.801>
- Burnham, M. (2024). Political or not. Retrieved from <https://huggingface.co/datasets/mlburnham/political_or_not>
- Bian Shengtao. (2022). Recipe. Retrieved from <https://huggingface.co/datasets/Shengtao/recipe>
- HuggingFaceGECLM. (2023). Reddit submissions [Dataset]. Retrieved
  from <https://huggingface.co/datasets/HuggingFaceGECLM/REDDIT_submissions>
- HuggingFaceGECLM. (2023). Reddit comments [Dataset]. Retrieved
  from <https://huggingface.co/datasets/HuggingFaceGECLM/REDDIT_comments>
- Pang, B., & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to
  rating scales. Proceedings of the ACL.
- Open Phi. (2023). Textbooks. Retrieved from <https://huggingface.co/datasets/open-phi/textbooks>
- Antypas, D., Ushio, A., Camacho-Collados, J., Neves, L., Silva, V., & Barbieri, F. (2022). Twitter topic
  classification. Proceedings of the 29th International Conference on Computational Linguistics. International Committee
  on Computational Linguistics. Gyeongju, Republic of Korea.
- Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in
  Neural Information Processing Systems 28 (NIPS 2015).

### Political leaning

- Baly, R., Da San Martino, G., Glass, J., & Nakov, P. (2020). We Can Detect Your Bias: Predicting the Political
  Ideology of News Articles. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing
  (EMNLP), 4982–4991. <https://github.com/ramybaly/Article-Bias-Prediction>
- Spliethöver, Keiff, & Wachsmuth. (2022). CommonCrawl News Articles by Political Orientation [Data set]. Conference on
  Empirical Methods in Natural Language Processing (EMNLP), Abu Dhabi. Zenodo. <https://doi.org/10.5281/zenodo.7476697>
- Burnham, M. (2024). Dem., rep. party platform topics. Retrieved
  from <https://huggingface.co/datasets/mlburnham/dem_rep_party_platform_topics>
  - Wolbrecht, Christina, Brooke Shannon, E.J. Fagan, Jones, Bryan D., Frank R. Baumgartner, Sean M. Theriault, Derek A.
    Epp, Cheyenne Lee, Miranda E. Sullivan. 2023. Policy Agendas Project: Democratic Party Platform.
  - Wolbrecht, Christina, Brooke Shannon, E.J. Fagan, Jones, Bryan D., Frank R. Baumgartner, Sean M. Theriault, Derek A.
    Epp, Cheyenne Lee, Miranda E. Sullivan. 2023. Policy Agendas Project: Republican Party Platform.
- Jones, C. (2024). Political bias dataset: A synthetic dataset for bias detection and reduction. Retrieved
  from <https://huggingface.co/datasets/cajcodes/political-bias>
- Nayak, J. (2024). Political ideologies. Retrieved
  from <https://huggingface.co/datasets/JyotiNayak/political_ideologies>. Licensed under Apache 2.0.
- España-Bonet, C. (2023). Multilingual Coarse Political Stance Classification of Media. The Editorial Line of a ChatGPT
  and Bard Newspaper (v1.0) [Data set]. Empirical Methods in Natural Language Processing (EMNLP), Singapore.
  Zenodo. <https://doi.org/10.5281/zenodo.8417761>
- Çöltekin, Ç., Kopp, M., Morkevičius, V., Ljubešić, N., Meden, K., & Erjavec, T. (2024). Training data for the shared
  task Ideology and Power Identification in Parliamentary Debates [Data set].
  Zenodo. <https://doi.org/10.5281/zenodo.10450641> Licensed under CC BY 4.0.
- nbandhi (2024). Political podcasts listing with audio links. Retrieved
  from <https://www.kaggle.com/datasets/nbandhi/political-podcasts-listing-with-audio-links>
- Van Steyn, J. (2023). Political tweets. Retrieved from <https://huggingface.co/datasets/Jacobvs/PoliticalTweets>
- Haak, F., & Schaer, P. (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions
  (1.0) [Data set]. Zenodo. <https://doi.org/10.5281/zenodo.7682915>
- Chen, W.-F., Stein, B., & Patrick Saad. (2018). Webis-Bias-Flipper-18 [Data set]. 11th International Natural Language
  Generation Conference (INLG 2018). Zenodo. <https://doi.org/10.5281/zenodo.3250686>
- Chen, W.-F., Al-Khatib, K., Wachsmuth, H., & Stein, B. (2020). Webis-News-Bias-20 [Data set].
  Zenodo. <https://doi.org/10.5281/zenodo.8321586>
- Liu, Y., Zhang, X. F., Wegsman, D., Beauchamp, N., & Wang, L. (2022). POLITICS: Pretraining with same-story article
  comparison for ideology prediction and stance detection. In Findings of the Association for Computational Linguistics:
  NAACL 2022. <https://huggingface.co/launch/POLITICS>

## Referenced existing models

### Politicalness

- Silcock, E., Arora, A., D'Amico-Wong, L., & Dell, M. (2024). Newswire: A large-scale structured database of a century
  of historical news. arXiv. <https://arxiv.org/abs/2406.09490>
- Burnham, M. (2024). Political DEBATE large [Model]. Hugging
  Face. <https://huggingface.co/mlburnham/Political_DEBATE_large_v1.0>
- GPTMurdock (2024). Classifier – main subjects politics [Model]. Hugging
  Face. <https://huggingface.co/gptmurdock/classifier-main_subjects_politics>

### Political leaning

- Liu, Y., Zhang, X. F., Wegsman, D., Beauchamp, N., & Wang, L. (2022). POLITICS: Pretraining with same-story article
  comparison for ideology prediction and stance detection. In Findings of the Association for Computational Linguistics:
  NAACL 2022. <https://huggingface.co/launch/POLITICS>
- Bucket Research. (2023). PoliticalBiasBERT (Revision f964ce8). Hugging
  Face. <https://huggingface.co/bucketresearch/politicalBiasBERT> <https://doi.org/10.57967/hf/0870>
- Sahitaj, P. (2024). Political bias prediction AllSides DeBERTa [Model]. Hugging
  Face. <https://huggingface.co/premsa/political-bias-prediction-allsides-DeBERTa>
- Jones, C. (2024). DistilBERT-PoliticalBias: A novel approach to detecting and reducing political bias in text.
  Hugging Face. <https://huggingface.co/cajcodes/DistilBERT-PoliticalBias>
- Shrimali, H. (2024). DistillBERT-Political-Finetune [Model]. Hugging
  Face. <https://huggingface.co/harshal-11/DistillBERT-Political-Finetune>
- Burnham, M. (2024). Political DEBATE large [Model]. Hugging
  Face. <https://huggingface.co/mlburnham/Political_DEBATE_large_v1.0>
- Vélez Castañeda, A. (2024). BERT-political_bias-finetune [Model]. Hugging
  Face. <https://huggingface.co/jhonalevc1995/BERT-political_bias-finetune>
- Nayak, J. (2024). Political ideologies detection RoBERTa finetuned [Model]. Hugging
  Face. <https://huggingface.co/JyotiNayak/political_ideologies_detection_roberta_finetuned>
- Palmqvist, O. (2024). DeBERTa political classification [Model]. Hugging
  Face. <https://huggingface.co/oscpalML/DeBERTa-political-classification>
- Newhauser, M. (2022) DistilBERT political tweets [Model]. Hugging
  Face. <https://huggingface.co/m-newhauser/distilbert-political-tweets>. Licensed under GNU Lesser General Public
  License v3.0. See <https://www.gnu.org/licenses/lgpl-3.0.html>.
