dmaengine: fix dma_unmap (was: Re: [PATCH 06/13] DMAENGINE: driverfor the ARM PL080/PL081 PrimeCells

dmaengine: fix dma_unmap (was: Re: [PATCH 06/13] DMAENGINE: driverfor the ARM PL080/PL081 PrimeCells

am 03.01.2011 17:36:00 von dan.j.williams

On Mon, Jan 3, 2011 at 3:14 AM, Russell King - ARM Linux
wrote:
> On Sun, Jan 02, 2011 at 09:33:34PM +0100, Linus Walleij wrote:
>> As for the in-tree PL08x driver I'd say it's doing pretty well for
>> memcpy() so we could add platform data for that on supported
>> platforms, then for device transfers we need more elaborative
>> work.
>
> It has the issue that it's not unmapping the buffers after the memcpy=
()
> operation has completed, so on ARMv6+ we have the possibility for
> speculative prefetches to corrupt the destination buffer.
>
> Neither are a number of the other DMA engine drivers. =A0This is why =
I'd
> like to see some common infrastructure in the DMA engine core for say=
ing
> "this tx descriptor is now complete" so that DMA engine driver author=
s
> don't have to even think about whether they should be unmapping buffe=
rs.

This requires that a copy of the mapped addresses be maintained
outside the driver's physical descriptor. This needs support from the
client to set up storage for this information (probably a
scatterlist). The dmaengine core could use this to implement a common
unmap routine. However, this still has the problem of how to prevent
unmapping too early in the multi-operation raid case and how to
communicate the full set of addresses to unmap to the final descriptor
in such a chain. I think the only way to fully solve this is to make
the client solely responsible for both mapping and unmapping.

=46or raid this will have implications for architectures that split
operation types on to different physical channels. Preparing the
entire operation chain ahead of time is not possible on such
configuration because we need to remap the buffers for each channel
transition. So, raid will have an optimized path for engines like
mv_xor, ioatdma, and iop-adma (iop13xx) where all buffers can be
mapped upfront (against a single physical channel) and then unmapped
when all stripe operations complete. For the others iop-adma (iop3xx)
and ppc44x we need to wait for each leg to finish before mapping and
issuing the next leg. There will most likely be negative performance
implications of waiting and reissuing, but as far as I can see this is
unavoidable.

> I'd also like to see DMA_COMPL_SKIP_*_UNMAP always set by prep_slave_=
sg()
> in tx->flags so we don't have to end up with "is this a slave operati=
on"
> tests in the completion handler.

Longer term I do not see these flags surviving, but yes a 2.6.38
change along these lines makes sense.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: dmaengine: fix dma_unmap (was: Re: [PATCH 06/13] DMAENGINE:

am 03.01.2011 17:52:27 von Russell King - ARM Linux

On Mon, Jan 03, 2011 at 08:36:00AM -0800, Dan Williams wrote:
> For raid this will have implications for architectures that split
> operation types on to different physical channels. Preparing the
> entire operation chain ahead of time is not possible on such
> configuration because we need to remap the buffers for each channel
> transition.

That's not entirely true. You will only need to remap buffers if
old_chan->device != new_chan->device, as the underlying struct device
will be the different and could possibly have a different IOMMU or
DMA-able memory parameters.

So, when changing channels, the optimization is not engine specific,
but can be effected when the chan->device points to the same dma_device
structure. That means it should still be possible to chain several
operations together, even if it means that they occur on different
channels on the same device.

One passing idea is the async_* operations need to chain buffers in
terms of , or
maybe . If the dma_device pointer is
initialized, the scatterlist is already mapped. If this differs from
the dma_device for the next selected operation, the previous operations
need to be run, then unmap and remap for the new device.

Does that sound possible?

> > I'd also like to see DMA_COMPL_SKIP_*_UNMAP always set by prep_slave_sg()
> > in tx->flags so we don't have to end up with "is this a slave operation"
> > tests in the completion handler.
>
> Longer term I do not see these flags surviving, but yes a 2.6.38
> change along these lines makes sense.

Well, if the idea is to kill those flags, then it would be a good idea
not to introduce new uses of them as that'll only complicate matters.

I do have an untested patch which adds the unmap to pl08x, but I'm
wondering if it's worth it, or whether to disable the memcpy support
for the time being.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: dmaengine: fix dma_unmap (was: Re: [PATCH 06/13] DMAENGINE:driver for the ARM PL080/PL081 PrimeC

am 03.01.2011 22:26:05 von dan.j.williams

On Mon, Jan 3, 2011 at 8:52 AM, Russell King - ARM Linux
wrote:
> On Mon, Jan 03, 2011 at 08:36:00AM -0800, Dan Williams wrote:
>> For raid this will have implications for architectures that split
>> operation types on to different physical channels. =A0Preparing the
>> entire operation chain ahead of time is not possible on such
>> configuration because we need to remap the buffers for each channel
>> transition.
>
> That's not entirely true. =A0You will only need to remap buffers if
> old_chan->device !=3D new_chan->device, as the underlying struct devi=
ce
> will be the different and could possibly have a different IOMMU or
> DMA-able memory parameters.
>

Yes, but currently operation capabilities are organized per dma device
(i.e. all channels on a dma device share the same set of
capabilities). The channel allocator will keep the chain on a single
channel where possible, but if it determines we need to switch to a
channel with a different capability set then we have also switched dma
devices at that point.

iop3xx and ppc4xx have this dma_device-per-dma_chan
organization.currently. They could switch to a model of hiding
multiple hw channels behind a single dma_chan, but then they would
need to handle the operation ordering and channel transitions
internally.


> So, when changing channels, the optimization is not engine specific,
> but can be effected when the chan->device points to the same dma_devi=
ce
> structure. =A0That means it should still be possible to chain several
> operations together, even if it means that they occur on different
> channels on the same device.
>
> One passing idea is the async_* operations need to chain buffers in
> terms of ,=
or
> maybe . =A0If the dma_device pointe=
r is
> initialized, the scatterlist is already mapped. =A0If this differs fr=
om
> the dma_device for the next selected operation, the previous operatio=
ns
> need to be run, then unmap and remap for the new device.
>
> Does that sound possible?

Yes, but the dma driver still does not have enough information to
determine when it is finally safe to unmap / allow speculative reads.
The raid driver can make a much cleaner guarantee of "this stripe now
belongs to a dma device" and "all dma operations have completed this
stripe can be returned to the cpu / rescheduled on a new channel".

>> > I'd also like to see DMA_COMPL_SKIP_*_UNMAP always set by prep_sla=
ve_sg()
>> > in tx->flags so we don't have to end up with "is this a slave oper=
ation"
>> > tests in the completion handler.
>>
>> Longer term I do not see these flags surviving, but yes a 2.6.38
>> change along these lines makes sense.
>
> Well, if the idea is to kill those flags, then it would be a good ide=
a
> not to introduce new uses of them as that'll only complicate matters.
>
> I do have an untested patch which adds the unmap to pl08x, but I'm
> wondering if it's worth it, or whether to disable the memcpy support
> for the time being.

We could disable the driver if NET_DMA or ASYNC_TX_DMA are selected.
That still allows the driver to be exercised with dmatest. Although
I notice the driver is already marked experimental, do we need
something stronger for 37-final?

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: dmaengine: fix dma_unmap (was: Re: [PATCH 06/13] DMAENGINE:driver for the ARM PL080/PL081 PrimeC

am 04.01.2011 23:34:31 von Linus Walleij

2011/1/3 Dan Williams :

> We could disable the driver if NET_DMA or ASYNC_TX_DMA are selected.
> =A0That still allows the driver to be exercised with dmatest. =A0Alth=
ough
> I notice the driver is already marked experimental, do we need
> something stronger for 37-final?

Your pick, IMHO. To use it out-of-the-box with 2.6.37 is not possible
on any system anyway - we have not patched in the required
platform data to any ARM system! Those who do such things surely
know what they're doing.

Yours,
Linus Walleij