AP Computer Science Principles
Unit 2.4 – Big Data and Privacy
1. What Is Big Data?
Big Data refers to extremely large and complex data sets that traditional tools cannot handle.
These sets are generated from many sources including:
- Websites and social media
- Mobile apps and smartphones
- Online shopping and transactions
- GPS and location services
- Sensors and Internet of Things (IoT) devices
- Security cameras and logs
- Health devices and patient records
The Three Vs of Big Data
- Volume – Huge amounts of data
- Velocity – Data created at high speed
- Variety – Many different formats (text, video, images, GPS signals)
2. How Big Data Is Used
Organizations use Big Data to identify patterns, make decisions, and improve services.
Examples
- Health care: disease tracking, faster diagnosis
- Retail: product recommendations
- Transportation: predicting traffic and routes
- Finance: detecting fraud
- Security: monitoring suspicious activity
- Science: climate models, DNA analysis, astronomy
3. Impacts of Large-Scale Data Collection
Positive Impacts
- Better decision-making
- More accurate predictions and personalization
- Medical breakthroughs and faster research
- Crime/fraud detection
- Smarter city planning and transportation
Negative Impacts
- Loss of privacy and anonymity
- Data breaches and identity theft
- Unwanted tracking or data collection
- Government or corporate surveillance
- Algorithmic bias leading to unfair decisions
4. Privacy Considerations
Privacy is the ability to control how personal data is collected, stored, and used.
Types of Personal Data Collected
- Name, address, and personal identifiers
- Search history and online behavior
- GPS location data
- Purchases, subscriptions, and spending
- Photos, posts, and social media activity
- IP address, cookies, device information
- Medical and health tracking data
Ways Data Can Be Misused
- Selling data to advertisers without permission
- Tracking users across websites
- Insurance pricing based on personal data
- Hackers stealing private information
- Using data to manipulate political opinions
5. Ethical Use of Data
Ethical data use means collecting and using data responsibly, with safety and user consent.
Principles of Ethical Data Use
- Transparency: Users should know what data is collected
- Consent: Users must agree before data is used
- Purpose Limitation: Data used only for intended reasons
- Security: Protecting stored data from breaches
- Minimization: Collect only what is necessary
- Right to Access & Delete: Users should control their data
Examples
Ethical:
- Using anonymized data for research
- Clear privacy policies
- Getting user permission before tracking
Unethical:
- Collecting unnecessary data "just in case"
- Selling user data without consent
- Using hidden tracking methods
6. Laws and Regulations
Several laws aim to protect user privacy and restrict misuse of personal data.
- GDPR (Europe): Requires consent, gives users control of their data
- CCPA (California): Gives users rights to view and delete stored data
- HIPAA (U.S.): Protects health and medical data
7. Key Terms
- Big Data: Large, complex data sets
- Correlation: A connection between variables
- Anonymization: Removing personal identifiers
- Data Breach: Unauthorized access to private data
- Encryption: Protecting data using coded algorithms
- Algorithmic Bias: Unfair outcomes caused by biased training data
- PII (Personally Identifiable Information): Data that identifies a person