Stanford Robotics Seminar ENGR319 | Autumn 2025 | General Compliant Robot Interaction
By Stanford Online
Summary
## Key takeaways - **Robots Struggle in Unstructured Contact**: Robots are good at collision avoidance and structured contact like warehouse picking, but struggle with contact in unstructured real-world environments because objects vary in shape, size, stiffness, and are often fragile or cluttered. [00:33], [01:07] - **CoinFT: Coin-Sized 6-Axis Sensor**: CoinFT is a coin-sized (US quarter), 2g capacitive force/torque sensor using two PCBs with cone-shaped electrodes and silicon rubber pillars, costing under $10 in materials versus $10,000+ for ATI sensors. [04:53], [07:30] - **CoinFT Matches ATI Accuracy, Survives Hammer**: CoinFT readings almost completely overlap with a $10,000+ ATI sensor, and it survives hammer hits giving reliable readings due to its simple design absorbing shocks, ideal for unstructured incidental contacts. [07:14], [07:50] - **UMI-FT Enables Force-Aware Data Collection**: UMI-FT modifies UMI with iPhone for vision/depth/pose and CoinFT sensors on fingers for 6-axis force/torque, capturing grasp and press forces during demos like whiteboard wiping, unlike original UMI. [19:56], [20:46] - **Force + Compliance Boosts Task Success**: Adaptive compliance policy using UMI-FT data achieves firm grasping and pressing for clean whiteboard wipes, skewering zucchini without slip, and light bulb insertion via touch-detected alignment, reaching 80-90% success; ablations show failures without force or compliance. [21:17], [13:44] - **Drones Gain Force Control with CoinFT**: CoinFT enables attitude-based force control on drones for robust sensor package attachment despite initial failure by pressing harder until no weight felt, surviving common crashes unlike expensive fragile sensors. [10:14], [11:02]
Topics Covered
- Robots Fail in Unstructured Contact
- CoinFT Replaces Bulky Sensors
- Drones Need Crash-Proof Sensing
- Force Teaches Dexterous Tasks
- UMI-FT Learns Adaptive Compliance
Full Transcript
Hi everyone again. I'm Ho Jang. This is
my eighth year at Stanford and my eighth year attending the Stanford Robotics Seminar and it's a great pleasure to finally stand here as the speaker today.
I'm very excited to talk about some of my work which is on general compliant robot interaction through scalable force torque sensing. Let's dive right in. So
torque sensing. Let's dive right in. So
robots nowadays are really good at not making contact. They can navigate
making contact. They can navigate through complex environments where people are walking around through collision avoidance. Robots are starting
collision avoidance. Robots are starting to work pretty well on uh with contact uh where with interactions involving contact on structured environments. for
instance, picking and placing objects in a warehouse or a factory.
However, robots still struggle quite a bit with uh interactions involving contact in unstructured real world environments. Uh so why is that?
environments. Uh so why is that?
The real world is complicated and unpredictable. Objects vary in their
unpredictable. Objects vary in their shape, size, and stiffness. Sometimes
they're even fragile. uh they're they're loc they're placed in a cluttered manner uh making uh the scene a very uh contactrich environment. So how might
contactrich environment. So how might robots do well in these type of environments.
One of the ways robots can improve is being able to make safe and robust contact and for that two elements I think are very useful. One is compliance
to uh mitigate impact once contact is made and another is tactile sensing. So
that once contact is made, you can modulate that contact. You could easily imagine trying to uh grasp strawberries at just the right force, gently move
them across your fingers, feeling the change in contact location.
And with that motivation, uh I've been working around this question, how can robots have a sense of touch and use it to compliantly interact with the world?
Today, I'll be talking about mainly two things. one is coin FT which is a uh
things. one is coin FT which is a uh compact and scalable force torque sensor and uh I'll also talk about um FT which is learning compliant manipulation at
scale from humans using these force torque data. So with that I would like
torque data. So with that I would like to start with introducing coin FT.
Uh precise force torque uh information allows robots to perform very precise tasks and often times these uh tasks
require forceful interaction for success. Uh for instance uh performing
success. Uh for instance uh performing surface treatment on a car frame or or assembly tasks in a factory. They all
require very forceful interactions. And
because of that there has been a long history of development in designing new force torque sensors uh including all sorts of different transduction methods
such as opto electronic capacitive por resistive vision based just to name a few. And these uh sensors all work
few. And these uh sensors all work pretty well but they're lab prototypes.
Uh you can't really use them unless you have a friend in that lab.
Some of the promising approaches here made it outside of lab and they're commercially available. Uh and and
commercially available. Uh and and they're great sensors, but there are limitations in using them. Oftent times
they're pretty bulky and heavy. They're
really expensive. They uh some of them are tens of thousands of dollars. Even
the cheaper ones are usually over $1,000 per sensor. And and most of all, they're
per sensor. And and most of all, they're pretty fragile. If you if you
pretty fragile. If you if you accidentally drop them, it'll probably go off calibration. And repairing that can take another couple thousand dollars.
These factors of existing commercial sensors make it very difficult to sensorize a lot of different robot platforms out there, especially the ones operating in unstructured environments
because in unstructured environments there are plenty of uh incidental contacts with large impulses and most likely the sensor will break faster than
anything else. Also uh the the potential
anything else. Also uh the the potential bulky size and weight it makes it difficult to sensorize a small lightweight uh robot platform such as
robot hands and drones without intensive uh capital investment.
So the driving question here is how might we design a sensor that's compact, light, and slim al but uh but also is
affordable and robust yet still accurate enough because well it's a sensor there be accurate and ideally it should also be tunable to fit the different needs of
different robot applications.
We designed coin FT coin FT is a coinsized capacitive force torque sensor. It's just the size of a US
sensor. It's just the size of a US quarter dollar. You could slide it in
quarter dollar. You could slide it in your wallet. It's also super light, only
your wallet. It's also super light, only 2 g. And the design is really simple.
2 g. And the design is really simple.
It's two PCBs with specially designed cone-shaped electrodes stacked together connected with an array of silicon rubber pillars. And underneath the stack
rubber pillars. And underneath the stack of PCBs, we have a shielding layer. But
to uh sense six force and torque, all you really need is the two rigid PCBs.
So how does a sandwich a pair of a rigid PCB become a sixaxis force torque sensor? While there's only one physical
sensor? While there's only one physical pair of PCBs, we leverage the innate capabilities of the microcontroller we're using to actively reconfigure the
electrodes internally to switch between different modes. For the sake of time, I
different modes. For the sake of time, I won't go deep into the details here, but basically we we reconfigure the electrode such that it switches between modes where it's sensitive to different
inputs in in sheer forces, nor normal force moments and such.
Um, what really governs or what really uh causes the signal patterns in coin FT is the change in relative pose between the two PCBs. And what controls that
change in relative pose is the dialectric layer in the middle. And by
tuning the mechanical properties of this dialectric layer, you can you can uh fine-tune the performance of coin FT as well. Um again, for the sake of time, I
well. Um again, for the sake of time, I won't go too deep into the details. But
basically, if you have a uh compliant dilectric layer, the coin FT will be sensitive, but you won't be able to sense very large forces. And vice versa.
If you have a stiff dialectric layer, coin FT is not as sensitive, but you can sense large forces. And there are a number of different parameters you can
tune. Not just the width of the silicon
tune. Not just the width of the silicon rubber pillar, but also uh the material properties, the pattern of the pillar, the number of pillars. You could also tune uh uh you can selectively tune the
uh stiffness across different axis of force and torque as well. So how how does coin perform? Right now what you're seeing here is uh is a live plot of uh
sensor readings from both coin FT and an ATI sensor. It might not be very clear
ATI sensor. It might not be very clear but there are two lines juxtaposed together. There's a dotted line and
together. There's a dotted line and there there's a real line and they almost completely overlap meaning that for this given force range coin of T is
almost as good as an ATI. Uh the black sensor you see down there is the ATI which is more than $10,000. Coin of T is only less than $10, but that's of course
assuming my labor is free. It's just a material cost.
Coin of T is also very robust. You could
hit it with a hammer and it still gives you very reliable readings thanks to the very simple design of, you know, two rigid PCB with with a compliant layer which really is good at absorbing shock.
Uh even with a hit hammer hit, it uh gives you very reliable readings. This
is super important for robots interacting in unstructured environments because like I said unstructured environments there are incidental contacts large impulses crashes coinft
can survive that but coin oft is not perfect. While it's very robust in the
perfect. While it's very robust in the compressive direction uh it's not as uh robust in the tensile direction. You
could imagine that coin is like an Oreo cookie. If you try to pull off the
cookie. If you try to pull off the cookies apart at some point it will peel off. Coin of T can also delaminate with
off. Coin of T can also delaminate with very large tensile uh forces.
So again, six axis force torque sensors exist out there. Uh but they're expensive, fragile, sometimes too bulky to use for different robot platforms,
but these combination of features of CoinFT making it compact, lightweight, robust, and affordable really unlocks a lot of different robot interactions. One
very good example I think is drones.
Drones uh can be very versatile with some careful teleoperation. Uh you could also do some contact based tasks as well. For instance, you could fly it off
well. For instance, you could fly it off the cliff of Hawaii to collect some really rare plant samples or you could fly it off uh in a tall skyscraper to clean the window. It's it's something
very dangerous if an actual human would do it. And ideally drones could do this
do it. And ideally drones could do this autonomously. But for that you'll need a
autonomously. But for that you'll need a contact sensor as well for these contact based tasks. However, uh with with
based tasks. However, uh with with drones crashes are pretty uh common and uh the existing sensors out there cannot survive these crashes and they're also
really expensive. But with coin FT you
really expensive. But with coin FT you could you could uh sensorize these drones and if one breaks you could just get another one because it's very affordable. So with my wonderful
affordable. So with my wonderful collaborator Junan, we explored forceful interactions using drones. Uh I'll save the details, but we have a custom drone
setup with uh with a endeector mount equipped with coin FT. You can swap uh different endectors, whether it's a contact tip or or a package of sensors.
And we designed a attitude-based force controller. If you're not too familiar
controller. If you're not too familiar with the term attitude control, it basically means you're you're not controlling the drone based on set positions. It means you're controlling
positions. It means you're controlling the uh direct thrust on the motors and uh the orientation. It's it's basically like force control almost. And of course
uh we have uh the force P controller is based on force error which is measured by coin FT in this case. So how does
that perform? Here's a demo of a drone
that perform? Here's a demo of a drone trying to uh attach a package of sensors on a horizontal surface. Let's use some imagination. We want to monitor force
imagination. We want to monitor force fire, but we can't send people into deforest. So, you would send in a drone
deforest. So, you would send in a drone and attach uh maybe, you know, some kind of sensor that's going to monitor fire.
And uh here what the drone is trying to do is it initially tried to gently attach the object, but it failed and it still feels the weight of the object. So
it decides to uh press harder and I wish I could fast forward here but uh what you will see is that with a large enough force forcebased control P again
uh with a large enough force the drone is able to attach these uh sensors robustly and uh it knows that the sensor is attached because it no longer feels awake. The sensor gets activated. Uh you
awake. The sensor gets activated. Uh you
need to use some imagination here for real world deployment.
Yeah. So, uh that was a drone application, but uh coin FT was also useful for different applications such as wearable robots.
Uh another another uh set of applications we explored is sensorizing wearable haptic devices using coin FT.
So with professor Allison Okamura and her wonderful students and postocs we explored attaching coin FT uh on on on haptic devices on the fingertip or the
uh forearm or the arm and it's really important to do uh force informed control here because uh everybody because of ergonomic issues uh with wearable devices everybody has a
different body shape. Everybody has a different finger shape, different size, different stiffness across your arms. And if you just do simple position based control, it's very difficult to provide
consistent haptic feedback. Uh but with forcebased interaction, you could keep you could uh control the interaction force. So it could the haptic feedback
force. So it could the haptic feedback could become more consistent.
Beyond drones and uh wearable wearable devices, we also uh tried uh using force information to teach robots how to perform dextrous tasks with professor
Janette Bogue and uh my wonderful collaborator Claire. We sensorized
collaborator Claire. We sensorized multiple fingers of the allegro hand to teach robots how to do fine grain manipulation tasks such as plucking a
battery out of a socket. And here you can see human fingers uh in the view.
That's basically kinesesthetic teaching.
We are moving the robot finger directly teaching what kind of motion it must do and we do ablation studies with force without force. What we learn is that for
without force. What we learn is that for six different contact range tasks where uh forcebased interaction is really important. We see a huge jump in
important. We see a huge jump in performance without force. You could
easily imagine maybe that uh the finger would just sort of skim the surface, not quite apply a set amount of force, but with force informed policies and some
compliance control, uh the uh gripper, the the hand was able to perform these fine grain tasks rather reliably, reaching 80 90% success rates.
Coin FT is now being used outside of Stanford as well uh at Berkeley.
Jatendra and and and his uh student is uh sensorizing Dexter's hands to do more fine grain manipulation. Uh at at at Switzerland, uh Professor Davidid, of
course, he's sensorizing uh drones. And
in UC Santa Cruz, Professor Tamanga is uh trying to uh manipulate uh agriculture, sorry, sorry, uh crops, so like tomatoes, strawberries using
forceinformed careful manipulation.
The vision here is uh to uh is that uh robots outside of the Stanford community or a couple schools can all benefit from this technology. We are uh very soon
this technology. We are uh very soon open sourcing coin FT so that the researchers out there could just freely recreate coinft use it for any application they're interested in. But
we also envision that uh future robot products out there probably those are the ones interacting in our homes doing tasks. they could they should be able to
tasks. they could they should be able to have these uh you know affordable force torque sensors as well. So we're we we are patenting this technology and whoever wants to make a product out of
it will have IP protection.
So uh we talked about coin FT and uh how how the force information can benefit different robot interactions. Um but
just and and not surprisingly having force torque information is better than not having force torque information but having that information doesn't necessarily guarantee that the robot
will make the most out of that information. Um tasks require uh
information. Um tasks require uh different levels of compliant behavior and applied force across different phases of the task and humans naturally
learn throughout our lives how to adjust the compliance and applied force on different phases of the task. So how
might robots learn from humans this adaptive compliance behavior and effectively perform contactbased manipulation tasks?
Uh so so that brings us to our next topic um FT.
Lately there's an there's a very active emergence of all these robotics companies promising uh that the mundane household tasks will soon be automated.
And if you just look at the YouTube videos and the LinkedIn post they they they share it almost seems like maybe by next year we'll have these robots doing
our dishes at homes. But again behind the scenes it's really not uncommon uh to see these robots fail rather catastrophically and again these are
mostly contactrich tasks and uh robots large robot models they will benefit from compliance and tactile behavior as well and and uh if if these information
are in the robot model in the context of learning from demonstration with compliance and tactile sensing I I think three things
are very important. One is a largecale data collection platform for obvious reasons. Robots need a large scale data
reasons. Robots need a large scale data to learn from. Uh another is a powerful algorithm for learning from demonstrations and uh adaptive
compliance. It should be able to learn
compliance. It should be able to learn the adaptive behav adaptive compliant behavior from humans.
And last but not least, uh there should be a compact affordable force torque sensor that can scale up with the large scale data collection platform.
And with this motivation, there has been a lot of work recently uh combining tactile sensing and robot learning. And
these are all great work. I I learned so much uh from them while reading them, but they don't quite hit all the three requirements I just mentioned. They're
either not very scalable, they don't they don't adapt uh the compliant behavior, or sometimes they're just too fragile.
At Stanford, we happen to have all of these three requirements. We have the universal manipulation interface or UMI as the large scale data collection
platform. Uh really quickly on um if you
platform. Uh really quickly on um if you guys are not familiar with it, it's a handheld device equipped with a GoPro camera that collects vision and post data from a human demonstration. And
using this uh demonstration data, robots can learn uh dextrous tasks such as washing dishes.
And and the idea here is that because this is a portable, lowcost scalable device, you could collect large scale robot data easily anywhere you want.
And for the powerful algorithm, we have the adaptive compliance policy. Uh I'm
going to call this ACP from now on. And
for a little bit of context, uh compliance control allows robots uh to behave like a spring. And uh compliance control is really cool, but it has
trade-offs. If the robot is compliant,
trade-offs. If the robot is compliant, it can make safe contact because it's behaving like a spring. But the downside is that you lose tracking accuracy. So
uh there has to be a balance between these two but uh and and hence adaptive compliance is very important.
What adaptive what ACP allows robots to do is to uh learn from force data where to be compliant and where not to be compliant. So, uh, from kesthetic
compliant. So, uh, from kesthetic teaching, if you use ACP, a robot can learn, uh, when when wiping a vase to be compliant on the contact direction while while still behaving pretty stiff on the
lateral direction so that it could track the contour of the vase uh, accurately.
And last but not least for the uh, scalable forster sensor, we have coin FT, which I just talked about. And in my as as a posttock in Shran's lab, I've been working on combining all these
three puzzles together. And we call it Umei FT. Umi with for stroke sensing.
Umei FT. Umi with for stroke sensing.
And this is um FT. This is a modified version of the original UMI. It now has an iPhone. The iPhone gives you vision
an iPhone. The iPhone gives you vision information, depth, pose for free. And
each of the fingers are uh sensorized using the coin sensors. So there is finger level sixaxis force torque sensing.
Here's how a demonstration would look like. Uh, as I try to pick up a
like. Uh, as I try to pick up a whiteboard eraser and wipe the whiteboard, I get natural haptic feedback through the transmission of the device. I know exactly how hard I'm
device. I know exactly how hard I'm grasping. I know exactly how hard hard
grasping. I know exactly how hard hard I'm pressing on the uh whiteboard. But
in the original UMI device, none of this haptic information was collected. So,
the robot can't really learn the uh forceful interaction. But with UMIF FT,
forceful interaction. But with UMIF FT, that's now possible.
So here's the raw data from UMIFT. You
can see how the iPhone gives you vision, depth, and pose. And on top of that, from the two coin FTs on each fingers, we know exactly how hard the object is being grasped. We know exactly what
being grasped. We know exactly what force is being applied to what direction.
And of course, UMI FT is also a scalable device. Uh
device. Uh it's it's a portable device, so you could collect data anywhere you want, anytime. uh and and it's very it's it's
anytime. uh and and it's very it's it's a scalable way of collecting multimodal robot data.
So using the multimodal data we collected we trained a enhanced version of the adaptive compliance policy and what we see is that the robot was able to learn from my demonstrations not just
the trajectory but also the forceful behavior. You could you could tell from
behavior. You could you could tell from the video a little bit and and most mostly from the force data that it is indeed grasping fairly hard and pressing really hard and and and for for this
task whiteboard wiping it is really important to apply a firm force to do a clean wipe because otherwise there's always this res residue remaining on the whiteboard. Our policy was able to
whiteboard. Our policy was able to generalize for uh different environmental perturbations using the multimodal data to perturbations such as
uh different table height uh different and relative board height uh difference different uh eraser shapes. So we also had a narrower eraser
shapes. So we also had a narrower eraser or different drawings. And these
scenario scenarios are all out of distribution. They were all not uh
distribution. They were all not uh included in the training data.
Now what's really interesting I think is uh what happens when we start removing some of these key components of our method. So we we've done some baseline
method. So we we've done some baseline comparisons. Uh on the top left you see
comparisons. Uh on the top left you see where the force information is still used but there's no compliance control anymore. So what happens is that the
anymore. So what happens is that the robot is no longer able to uh uh quickly adapt to the contact force. So sometimes
it will just ram onto the surface. It's
no longer that reactive.
Another failure mode for the same policy is where uh sometimes the robot is lucky with uh the contact force. So it can wipe well, but some other times it'll
still make contact but not press hard enough. So there's always this residue
enough. So there's always this residue remaining on the whiteboard because you really do need to uh press hard for a clean wipe. And that's what uh
clean wipe. And that's what uh compliance control a low-level compliance controller uh enables the robot to do with force torque information.
On top of that, if we remove force information completely, the robot also starts failing to grasp the eraser because the the grasp grasping motion is
also controlled based on force usually.
But here you're seeing that uh the gripper is overfitting to the width of the gripper from the training data. So
uh controlling the gripper based on contact force allows some level of generalization in grasping different objects. And I think one one of the most
objects. And I think one one of the most uh interesting uh comparisons is where we use a contact mic uh for for the
tactile information which is which is a method a lot of people use uh these days.
With contact mic basically you're hearing the sound of the contact. It
gives you very high quality dynamic information, but it doesn't give you a static information. Once you make
static information. Once you make contact and remain in contact, there's no sound, no no information from the contact mics. And uh not surprisingly,
contact mics. And uh not surprisingly, it it with contact mic uh the robot succeeds in making contact, but it it either uh rams it usually just rams onto
the surface of the whiteboard uh and and it triggers a safety limit.
Another interesting task for forceful interaction is skewing zucchini. And
here force torque information is really important because you you do want to make sure you're holding on to the zucchini firmly while uh performing this insertion. Otherwise, it'll it'll uh it
insertion. Otherwise, it'll it'll uh it can't overcome uh the the contact force and the zucchini will slip off.
So here's exactly that case. Once we
remove the force information, the zucchini slips off from the grasp while trying to uh puncture the skewer.
Another very interesting uh contactrich forceful task is inserting this light bulb uh onto the socket here. Uh you can see that there's a bayonet pin uh as as
a connector. And if you imagine when
a connector. And if you imagine when humans do this task, we don't carefully look at where the slot is and we don't carefully align the pin. We just naively make contact and and rotate the bulb
until there's an alignment. And we know that through uh our sense of touch. And
that's exactly what the robot is trying to do here. Uh make contact, just rotate insert.
And not surprisingly, as we start removing uh compliance, uh the robot is no longer uh regulating the contact. And
sometimes uh the bulb just overshoots it. It just rotates uh beyond the slit.
it. It just rotates uh beyond the slit.
And other failure modes include uh as as we remove the contact force information from the policy it could the bulb slips out uh from the grasp while it's trying
to insert.
So looking ahead, I think some interesting directions along the lines of uh robots having the sense of touch and using it well is uh general robots
uh performing compliance control using affordable force torque sensors such as uh coin FT. Right now it's really the expens expensive industrial arms with wrist force torque sensors or or
accurate joint sensors that could do these compliance control. But with coin FT, I think many different robot arms, even the cheaper ones, can uh perform
compliance control.
Also, uh there could be a more scalable way of collecting multimodal data. I
imagine that uh well well um FT was one of the attempts, but I imagine that the future uh robots will uh when they're performing these dextrous tasks, they're probably going to have more than two
fingers. And in order to collect uh
fingers. And in order to collect uh forceful data for these robots with multiple fingers, uh maybe a different form factor of a data collection device with uh tactile or force sensors are
needed.
And despite all this effort, I think there still will be uh a gap between the availability and just vision post data and multimodal force force torque data as well. And and it's an interesting
as well. And and it's an interesting question to ask how do we bridge this gap? how do we make the best out of uh
gap? how do we make the best out of uh all of these existing data? So, some of the uh interesting directions I think are uh fine-tuning is existing large vision models with uh tactile
information or or training some kind of uh residual policy. So, uh with that, thank you for listening. I uh I think we're out of time, but I I I would like to take some questions. Thank you.
[applause] One question.
>> Um, one of the I have a question about how in the system uh interacts with the uh contact power that coin FT gives you.
Because the whole point of or one of the major points of having that GoPro look at the compliant fingers is that you can get a sense of deformation force getting that width of the eraser just right.
What is it that like what is the magic step that quite gives you for that?
>> That that is a wonderful question. So uh
it's true that uh in the original OM you can visually infer some level of force.
It turns out it's it's very it's it's it's a very noisy data. it's very hard to get an accurate sense of what the exact contact force is. Um, that that's one thing. And the benefit of having
one thing. And the benefit of having multiaxial force torque sensors such as coin FT is that it really does allow you to perform compliance control. You can't
do compliance control with vision- based force sensing from the UMI originally.
Uh, with multiaxial force sensing, you could do compliance control, which really allows robots to make safe contact.
Yep.
Loading video analysis...