In recent years, large language models (LLMs) like ChatGPT and OpenAssistant have made big waves. They’ve shown they could change computational social science (CSS) by offering new insights. Yet, experts are still checking how well these LLMs do in CSS tasks without prior training1.
Research shows that in a zero-shot scenario, current LLMs might not reach the same level as smaller, specialized models like BERT-large1. It has been found that different ways of asking these models questions can change their accuracy a lot, sometimes by more than 10%1.
Instruction-tuned LLMs are really good at understanding language and responding to specific requests1. This ability is promising for CSS work, such as tagging data automatically and analyzing social media1.
A study compared the performance of GPT-3.5-turbo and OpenAssistant-LLaMA in classifying social media content without specific training1. It was noted that smaller, specially tuned models usually did better than LLMs in these tests1.
Trying different ways to ask LLMs questions, like using synonyms, can make them work better and more reliably1. This suggests that you don’t always need complicated strategies to improve how these models perform1.
Key Takeaways:
- Large Language Models (LLMs) have the potential to revolutionize Computational Social Science (CSS) by unlocking new possibilities and insights1.
- Current LLMs may not match the performance of smaller, fine-tuned models in CSS classification tasks1.
- Different prompting strategies significantly affect classification accuracy, with variations exceeding 10% in accuracy and F1 scores1.
- Instruction-tuned LLMs exhibit impressive language understanding and can generate responses following specific prompts1.
- Task-specific fine-tuned models generally outperform LLMs in most zero-shot settings, even with a smaller model size1.
Advancements in Computational Social Science Research
The field of Computational Social Science (CSS) has greatly evolved with the use of Large Language Models (LLMs). This change has improved how we study and view social phenomena, as highlighted by the statistical data from the link 32. Ordinary methods like Ordinary Least Squares (OLS) can’t fully grasp complex social patterns. This is because they struggle with nonlinear relationships without clear prior definitions, as supported by the statistical data from the link 32.
Machine Learning (ML) introduces better methodological tools for CSS researchers. This helps overcome the limitations of traditional models like OLS, as evident in the statistical data from the link 32. Supervised ML techniques, for instance, can better capture nonlinear patterns and enhance causal understanding, as observed in the statistical data from the link 13. By using ML, researchers can explore vast data types – including text and images – to dig into cultural changes, inequality, and societal matters, as indicated in the statistical data from the link 24.
Putting AI, ML, and Data Science tools to use in CSS can truly transform societies and improve social science practices, as highlighted by the statistical data from the link 32. These approaches not only serve academic interests but also allow us to explore impacts on society. They help in reducing risks for vulnerable groups and understanding social policy effects, as supported by the statistical data from the link 32.
Enhancing Collaboration and Navigating Challenges
Using AI, ML, and Data Science in CSS calls for teamwork among social scientists and tech experts, as recommended by the statistical data from the link 32. Such collaboration is key to address the field’s emerging challenges and ethical questions. Together, they can manage AI technology wisely, making sure its benefits are fully realised while its risks are kept low, as emphasized by the statistical data from the link 32.
Social scientists need to adopt AI methods to thoroughly grasp their effects on society and policies, as mentioned in the statistical data from thelink 32. Engaging with these technologies opens new research paths and delivers invaluable insights. However, it’s vital to participate in shaping the rules and ethics around these technologies. This makes sure their potential is used well and their dangers are controlled, as highlighted by the statistical data from the link 32.
Data Source | Statistical Data |
---|---|
Link 1 | LLMs achieved an F1 score of 76.0% for stance detection in CSS tasks, with substantial agreement (κ = 0.58) with human annotations (Statistical Data source:3). |
Link 1 | LLMs produced explanations in free-form coding tasks that exceeded the quality of crowdworkers’ references, showing promise in creative generation tasks (Statistical Data source:3). |
Link 1 | LLMs achieved moderate to good agreement scores ranging from κ = 0.40 to 0.65 with human annotators in various CSS classification tasks (Statistical Data source:3). |
Link 1 | LLMs demonstrated mixed results in few-shot learning, suggesting varied performance improvements across different CSS tasks (Statistical Data source:3). |
Link 1 | LLMs could potentially improve the quality of annotations by providing an additional layer of validation in the CSS annotation process (Statistical Data source:3). |
Link 1 | Future directions of LLMs in CSS include augmenting human annotation for efficiency, bootstrapping creative generation tasks, domain-specific adaptation for better performance, functionality enhancement in classification and generation tasks, and the development of new evaluation methodologies (Statistical Data source:3). |
Link 2 | The study conducted a large-scale multi-prompt experiment with 362,928 annotations across four Computational Social Science (CSS) tasks: toxicity, sentiment, rumor stance, and news frames (Statistical Data source:4). |
Link 2 | 16 different prompts were generated for each task using a complete factorial design based on prompt design features such as definition inclusion, output type (label or numerical score), explanation inclusion, and prompt length (standard or concise) (Statistical Data source:4). |
Link 2 | LLM compliance and accuracy were found to be highly prompt-dependent, especially for multi-class tasks, with differences in compliance up to 55% across different prompts for Falcon7b model (Statistical Data source:4). |
Link 2 | Prompting for numerical scores instead of labels reduced both compliance and accuracy for most LLMs and tasks (Statistical Data source:4). |
Link 2 | Prompting with definitions improved ChatGPT’s accuracy without reducing compliance but reduced PaLM2’s and Falcon7b’s compliance (Statistical Data source:4). |
Link 2 | The impact of concise prompts on accuracy and compliance varied by task and model, with instances where concise prompts adversely affected either accuracy or compliance (Statistical Data source:4). |
Link 2 | Prompting LLMs to explain their input improved compliance with prompt instructions, but also changed the distribution of generated labels, for example, causing ChatGPT to annotate 34% more content as neutral when prompted to explain its output (Statistical Data source:4). |
Conclusion
Large language models (LLMs) can truly change computational social science (CSS). They offer new insights and skills.
These models might not always be the best at specific tasks. Yet, they agree with humans at a good level, shown by research5.
LLMs are great at explaining complex coding tasks. They often do better than people hired to create examples (source)6.
They can also tackle many CSS tasks straight away, without any training. This shows they could greatly help in CSS work3.
To make LLMs work well in CSS, social scientists and AI experts must work together. They need to test different ways to ask these models for help. This way, social scientists can push CSS forward. They can find new ways to understand and research thanks to LLMs56.
FAQ
What are large language models?
Large language models (LLMs) are powerful tools that understand and create language. They could change how we study social sciences, giving us new insights.
How do large language models impact computational social science?
They introduce advanced ways to explore social questions. These models help in examining complex patterns and making better predictions in social science studies.
Do large language models perform well in computational social science tasks?
Large models may not always be the best at specific tasks. But, they have a huge potential in studying broad social topics, like cultural changes.
What is the impact of prompting strategies on classification accuracy?
Prompting strategies can heavily influence how accurate these models are. Different approaches can lead to big differences in results. It’s important to find the best ways to use them.
How can social scientists leverage large language models effectively?
Social scientists should work closely with AI and data science experts. Together, they can tackle the challenges and ethical issues. This makes the most of what large language models offer for social studies.
Source Links
- https://aclanthology.org/2024.lrec-main.1055.pdf – Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science
- https://dlab.berkeley.edu/news/computational-social-science-social-world-challenges-and-opportunities – Computational Social Science in a Social World: Challenges and Opportunities
- https://medium.com/@marketing_novita.ai/can-large-language-models-transform-computational-social-science-39109e52aa09 – Can Large Language Models Transform Computational Social Science?
- https://arxiv.org/html/2406.11980v1 – Prompt Design Matters for Computational Social Science Tasks but in Unpredictable Ways
- https://aclanthology.org/2024.cl-1.8/ – Can Large Language Models Transform Computational Social Science?
- https://ouci.dntb.gov.ua/en/works/7325Qz69/ – Can Large Language Models Transform Computational Social Science?