Thursday, September 15, 2016

Paper Writing, and how to do it better next time (spoiler alert: no insights, just ideas)

Paper writing is the #1 inherent drag, for me, in the academic thing. There are other drags that are just accidental - like if the office printer stops working or whatever - but you can't really have academia without some kind of papers. (at least academia in its present form, hedge hedge disclaimer disclaimer) So here I'm wildly introspecting, coming up with some thoughts.

Why is paper writing hard? Because you have to keep a thing in your mind that is bigger than you can keep in your mind. A 10 page paper is bigger than you can keep in your mind. By the time you're ready to write a paper, you've done a few mostly-related studies, and you've come up with a kinda-coherent story, but it's not really coherent, so you've got to do all kinds of mental gymnastics to remember it all.

What am I thinking next time? Do the work in this order:
-0.25. Read a little bit
0. Studies
0.5. Read a lot
1. Story
2. Graphs
3. Everything else

-0.25. Read a little bit. Make sure you're not wasting your time with the studies you're going to do.

0. The studies. This is just most of your work. You usually have an idea why you're doing these, but you're never really sure, until you're done. So, do the studies, gather the data, figure out your results, and just kind of putz around with them for a long time until they've seeped into your brain. Ask yourself lots of questions about your data and write python scripts to prove or disprove them.

0.5 Read a lot. Test out possible stories you can tell based on your data and read papers until you can tell if that is a decent idea or not. Maybe you should do this before your studies? But that 

1. The story. This is #1 because this is the time (usually 2 weeks before a deadline in a mild panic) when you sit down and say "ok! gonna write a paper!" You should probably get all the coauthors in a room and lock the door and you can't come out until you have a story. This sounds painful, and it probably is. Maybe bring beer? Do as the ancient Germans did and make decisions mildly drunk and then confirm them the next day sober? I am quite serious about this.

2. The graphs. These are your evidence. After you've figured out what you want to prove, figure out the graphs that will make your point. (Sub out "graphs" for "math" or "photos" or "collections of anecdotes from interviewees" as appropriate.) Make those graphs. I guess they can be kind of rough, you can visually polish later, but they should be able to make your point. (not sure about this; maybe you should polish them right now.)

3. Then the rest of it should be easy. I mean, all the putting down words. Nobody reads them anyway. Just reference your graphs a lot. (I am being sarcastic, it is never easy.)

I have no idea if this is a good list. What I've done for the papers I'm writing now is basically studies first, then everything else in a big old blender of worries, and it's not pleasant.

Tuesday, May 24, 2016

Mailing Archived Emails as Postcards: Probing the Value of Virtual Possessions

our (w/ David Gerritsen, Jennifer Olsen, Tatiana Vlahovic, Rebecca Gulotta, Will Odom, Jason Wiese, and John Zimmerman) CHI 2016 paper, in hopefully plain English

Ok, Gmail archives all your emails forever, right? There's probably some good stuff in there! Emails from people you care about, memories of good times, photos, conversations. But people don't see it as meaningful at all. OTOH, they do store a bunch of old physical photos and postcards. Why are those things valuable while emails aren't? More generally, why are physical possessions considered so much more valuable than virtual possessions?

That's why we set out on the study. Nope, hold up, that's not true. We set out on it because we (Dave, Jenny, Tati, and I) were young grad students doing a class project and we were given visions of turning it into a CHI paper, which would be a nice gold star to have on our resumes. Personally, I dove in because I was the biggest coder on the group, and I thought it'd be a fun little engineering challenge. (and it was! but that was about 1% of the project.) And a way to impress Tati with my skills. Like Napoleon Dynamite.

So. Virtual possessions, physical possessions. How about if we take those virtual things, those old emails, and turn them into physical things, like postcards? We can automatically sift through all our participants' old emails, pick out particularly meaningful (we think) snippets, print them on postcards, and mail them to them. Then they will probably think "oh yeah, that was a great old email" and rethink how valuable their old emails are. So we did that, over a 3 month period, and interviewed each participant 3 times, and our conclusion was:

Nope! They didn't really care at all. Most of the postcards, they just threw away. Oh well. But in talking to them, we realized a couple key things:

1. virtual possessions often lose value because they lack context. If you have an old photo, it's probably in your old photo album, next to other old photos. Or maybe a scrapbook, or a book of old good stuff you've saved. Your old emails? They're in a list with other (probably useless) emails. You can't really recall the whole memory just from a few words, and you've got nothing else around to help it.

2. virtual possessions are often useless because even the good ones are lost in a pile of trash! Your old emails are 1% wonderful conversations and 99% receipts from Amazon and ads from Bed Bath and Beyond. So you do this cognitive simplification by just considering it all junk. (It'd be a pain to try to remember or keep links to all the valuable stuff!) And even if you do find a couple good old emails, well... they're still there, if you need them, so what's the point of attaching any value to them?
Physical things don't have this problem: you throw out all your junk mail, so it no longer adds to clutter. But the blessing and the curse of email is that you can keep it all, forever.

So these insights might help you design more valuable virtual things, maybe!

Or maybe not! Maybe just say "eh we're saving your old emails for purely utilitarian reasons; we're not trying to replace your phone book too." Maybe virtual things will accrue value in completely different ways from physical things, and we just have to deal with it.

For more waffling and a fuller account of our adventures: our paper.

Getting Users' Attention in Web Apps in Likeable, Minimally Annoying Ways

our (w/ Josh Hailpern and Anupriya Ankolekar) CHI 2016 paper in hopefully plain english

Why are there still so many pop-ups? Even if you're sure that Your Website Dot Com really desperately needs to show me an ad for lawnmower fuel or a notification that you've updated some incoherent legalese in your terms of service, do you have to blot out my whole screen? Couldn't you do something that people will hate a little less?

So we ran a simple study where we have Mechanical Turk workers play the game Set while we try 15 different ways to get users' attention and then ask them how well they like them.

We found, based on survey questions after they finish the game, that they find some of the attention grabbers more annoying than others, and some of them more noticeable than others. These usually correlate (more noticeable = more annoying) but you can sometimes get a little more noticeable without getting annoying, or vice versa.



We didn't find that certain attention grabbers make people better or worse at Set, or that they make them respond faster, or that they make them remember things better, or that they change the overall usability of the system, or the overall immersion in the game. Probably other things we didn't find too, see the paper.

Based on what we found, it looks like glowing shadows are a little better on average than popups (better = equally noticeable but less annoying), that your popup could be less annoying if it doesn't cover the screen behind it, and that the little message icon with a badge (like on Facebook or Twitter, showing you how many notifications you have) is good for low-interruption needs.

My confidence in the results: low! These are small effects. And you could poke a lot of holes in the study (why these 15 attention grabbers? how will this respond in a real-world situation? did people just kinda like our glowing shadows because they're prettier than some of the other options?) - we point out some of these in the paper. But it's something, and science hopefully progresses by a bunch of tiny steps.

Here's the paper!

Saturday, May 21, 2016

Story arcs

A lot of people have a "story arc" that they use for a lot of their papers/stories.

Some I can think of:
"What is it so what": "what is it", "is it so", "so what" - John Zimmerman
"ABT": And, but, therefore - Randy Olson c/o Better Posters blog
"What's the problem, and who cares?" (to start) - what I've gathered from talking with Jason Hong
Heilmeier questions - from George H. Heilmeier - this is more about questions you ask yourself before starting a project, not a talk you give about it when you're done, but it wouldn't be terrible if you gave a talk answering all of them.

I'd love to hear others, if you have them. Ideally I hope to collect a bunch and then learn what's in common between all of them.

ICWSM 2016 neat things

Back from ICWSM! It was the first time I'd been there. Felt more like Ubicomp than anything else I'd been to. Lots of people finding some correlation with p<.00001 and r^2=0.3, so what does it mean? I mean, it definitely means something, but I'm kind of frustrated by how difficult it would be to turn it into an application. I think the social scientists were frustrated too, by people's lack of social science training. I think the computer scientists were all keenly aware of their data and method's weaknesses... but they still found something. (and it still got published.) Lots of interesting data, a lot of scraping things, and a lotttttt of Twitter. Less polite than CHI or CSCW, which is a double edged sword: on one hand, I was kind of taken aback by some blunt questions. On the other hand, if we're not disagreeing, how are we getting anywhere? I had two conversations end with "ok, let's agree to disagree," and they didn't feel great, but I'm open to the possibility that that's a sign of intellectual diversity or something else good.

Lots of shared data sets!

Neat things applicable to cities:
City Dashboard - kind of overwhelming, but a start!

LikeWays - recommend the most interesting path to a thing, not just the quickest. Someone with an iPhone, try this out and tell me what it's like.

"Will check-in for badges", Gang Wang - basically, Foursquare doesn't represent real mobility (of course); it's really only good for applications that don't really matter if you get them wrong (like recommending restaurants).

Emotions, demographics, and sociability in Twitter interactions, Kristina Lerman - I had wanted to do a study like this: correlate a ton of stuff in different geo areas, see what comes out. People in higher income places have more weaker ties. (there's a lot going on there, though; it's kind of hard to interpret, or know why that would be.)

Other neat papers:

Identifying platform effects in social media data, Momin Malik - uses regression discontinuity to understand sudden things that happen on social media, which are because of a thing the platform did, not because of real effects. For example, Netflix changed the labels on their reviews (something like "I somewhat like it" to "I sort of like it") which changed review scores to jump suddenly.

When a movement becomes a party, Pablo Aragon - there was a bunch of grassroots talk around elections in Spain, so they followed one party, Barcelona en ComĂș, to see if they stayed all grassroots and decentralized, or if they evolved into a hierarchical organization. They found two groups: one for the movement (which stayed decentralized) and one for the party (which got hierarchical).

"Blissfully Happy" or "Ready to Fight", Hannah Miller - you've probably seen this on the news, it's super popular. Some emojis look different on different platforms. I use :D a lot but I guess on an iphone it looks angry. Some emoji are hard to interpret even within platform. (those raised hands! what does that mean!) This can be a problem.

Other useful tools:
Bot or Not - is this twitter account a bot?
(another quick heuristic: if # of followers/# you follow < .1, it's likely you're a bot)
Face++: face recognition tools
Gender detector - Is this name male or female? (python) (a different one in ruby)
IBM Watson Personality Insights Service - give it text, it gives you Big 5 personality scores
Complex Contagion models: models a thing where you have to be exposed to something N times before you get it too.
CommonCrawl - if you ever need a huge crawl of the web.
Want to find a set of ppl with known ages on Twitter? Just search for tweeters wishing each other "Happy (N)th Birthday!" Similarly, want to know what time ppl wake up (to track daylight savings or something), just search for people saying "Good morning!" Twitter is big, and there are at least a few people who say almost anything.
For what people say more in a place than others: probability that it appears there minus probability it appears at all. From a paper about #foodporn.
I finally learned what a tensor is: an n-dimensional matrix. And there are tools like PARAFAC decomposition, which is similar to matrix factorization, which is useful in some cases.

Friday, May 20, 2016

How You Ought To Be Networking, for younger PhD Students

Here's one set of suggestions, from Jean Yang. Here's another list of suggestions, from Xiang Anthony Chen.

If I had to make a list, it would be one item:
1. Don't worry, it's okay.

Look, you're a young PhD student. Of course you're nervous and second guessing everything you do. In a conference environment, you'll hit all kinds of extra stressors: people asking about your work, people you're supposed to know, people you know even though you're not supposed to, weird group social dynamics, parties, talks you're supposed to be attending, talks you're supposed to be understanding, sleep deprivation, travel issues, big rooms of 500 people. I'm not going to tack on more anxiety by telling you more things you should and shouldn't do.

Because this is a blog and I can yammer on a bit, here are some more thoughts.

You can sit with the same people twice!
It's okay! My old advisor, Anind Dey, used to complain about CMU students all sticking together, so I got this anti-CMU itch, like I gotta go *network* more and avoid hanging out with my friends. This feels unnatural. Having a couple friends makes you way more confident. And what usually ends up happening is it's two people I know and three people I don't, and I make three strong connections instead of one weak one. And even if you don't meet any new people in a given half hour, you're probably strengthening older relationships instead. Every interaction with another person either builds your quantity or quality of relationships; both are good.

You don't need to seek people out and prepare talking points and questions.
This is always awkward, in my experience. Like, I'm a new kid, how am I going to have brilliant questions (or even valid questions) for your work? In the worst case, you run the risk of getting all fanboi. By all means, don't feel afraid to approach anyone, but don't feel like you have to go scavenger hunting and ticking off boxes.

It's useful to ambush randos at smaller conferences, not at CHI. Just find someone and start talking to them. I agree with it at a smaller conference (a few hundred people). I disagree at CHI. Most randos at CHI do something totally different from you and you'll never see them again. Anyway, if you're not great at ambushing randos, that is fine too; you'll meet people via friends of friends and other ways anyway.

Relatedly: Student Volunteer at smaller conferences, and not at CHI. At smaller conferences, it's great, gives you something to do when you don't have many friends yet, makes you some friends, and saves you a few hundred bucks. But at CHI, you meet a bunch of randos and it sucks up your entire week (including staying late or getting up early, which adds physical stress to an already-stressful week). If your advisor really wants you to SV, tell them you'd rather pay the conference fee yourself. (of course, this is only if you can afford to do so. btw, go to grad school in Pittsburgh so you can afford to do so :)

Jean and Anthony have a lot of good points! Including these: wear comfortable shoes, carry a notebook, always wear your nametag (shorten the cord a bit so it's easy for people to see your name while talking with you), carry a jacket (even if it's warm; conference rooms are often cold), skip sessions (especially the early morning ones! you can't do anything if you don't sleep!), eat healthy.

Make friends, talk to them a lot, they'll be your colleagues forever. (from Jeff Bigham on Twitter (1, 2, 3))
don't worry too much about talking to the famous ppl, your friends will be famous soon! so, be merry w/ them
…and, don't take it too badly if the famous person is off hanging w/ their friends, you'll know/be them soon enough

Our House, in the Middle of Our Tweets: A summary in plain English

... I hope! Tell me if this is not actually as plain English as I hope it is. For the tl;dr, just read the headings.

1. We did a pretty good job of finding where people live, if they've posted geotagged tweets.

By "geotagged tweets", we mean "tweets with a lat/lon point." This is rare: about 1% of tweets have this. When you use Twitter, your tweets are not geotagged by default; you have to go in and select "yeah, add my location." (now, as of a few months ago, you even have to click another button that says "share precise location", so not many people do it.) But some people like to do it, to show that they are somewhere or remember or who knows why.

We tried to tell where they live at the neighborhood level. We could find about 79% of users' homes within 1km. (56% within 100m, 88% within 5km).

How do we know we found their homes? We collected 195 people's addresses in Pittsburgh by asking them in an online survey. (We asked the 4119 most common geotagged tweeters in Pittsburgh, 195 responded, after filtering out spam etc. We paid them with a $5 Amazon gift card.)

2. It's not that hard: remove daytime tweets and social cross-posts, and use grid search.

If you're trying to find someone's home, first take out all the tweets during the day (6am-8pm). Then take out all the social cross-posts from Foursquare and Instagram and all other social apps. In both of these cases, you lose a little bit of signal and a lot of noise. Like, your daytime tweets are sometimes at home and sometimes away, but your nighttime tweets are way more often at home.
Then use grid search. Bin all tweets into 1-degree lat-lon square, and pick the square that has the most tweets, and throw out the rest. Then bin those tweets into 0.1-degree squares, and pick the square that has the most tweets, and throw out the rest. Do the same at 0.01-degree and 0.001-degree. Center of that square is their address.

This might seem like a simple algorithm, and it is! We tried a bunch of more complicated things (see paper for details) and they didn't work as well.

3. However, this turns out to be more useful to learn things about places than about people.

Ok, pretty neat result, but sort of not awesome, for two reasons. First, 79% isn't that great - you can't really build that into a product if it fails 1/5 times. And there's good reasons we can't get much better - maybe 85% but probably not higher (see the paper). Second, as I just explained, almost nobody geotags their tweets! What good is a "learning about people" algorithm if it can only learn about 0.01% of the population?

Here's what it might be good for: learning about neighborhoods. If we can figure out where a bunch of people live, then we can put together a set of people who live in your neighborhood, and figure out what they're saying. That's what we're currently thinking.

More: read the paper!